Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology

Hakim: Farsi Text Embedding Model

Fast Text-to-Audio Generation with Adversarial Post-Training

Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage

Intelligent Product 3.0: Decentralised AI Agents and Web3 Intelligence Standards

Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

Decoding Futures Price Dynamics: A Regularized Sparse Autoencoder for Interpretable Multi-Horizon Forecasting and Factor Discovery

Steerable Chatbots: Personalizing LLMs with Preference-Based Activation Steering

SafeNav: Safe Path Navigation using Landmark Based Localization in a GPS-denied Environment

Don't be lazy: CompleteP enables compute-efficient deep transformers

Llama-Nemotron: Efficient Reasoning Models

Rethinking Time Encoding via Learnable Transformation Functions

CrashFixer: A crash resolution agent for the Linux kernel

Harden and Catch for Just-in-Time Assured LLM-Based Software Testing: Open Research Challenges

Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis

Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling

Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models

Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

The Problem of the Priors, or Posteriors?

Simulating and Analysing Human Survey Responses with Large Language Models: A Case Study in Energy Stated Preference

Learning Autonomy: Off-Road Navigation Enhanced by Human Input

InductionBench: LLMs Fail in the Simplest Complexity Class

Activation Steering in Neural Theorem Provers

PropNet: a White-Box and Human-Like Network for Sentence Representation

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

FAS: Fast ANN-SNN Conversion for Spiking Large Language Models

Learning Traffic Anomalies from Generative Models on Real-Time Observations

High-temperature superconductivity in Li$_2$AuH$_6$ mediated by strong electron-phonon coupling under ambient pressure

A Bio-Inspired Research Paradigm of Collision Perception Neurons Enabling Neuro-Robotic Integration: The LGMD Case

ThreatModeling-LLM: Automating Threat Modeling using Large Language Models for Banking System

Is Linear Feedback on Smoothed Dynamics Sufficient for Stabilizing Contact-Rich Plans?

What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks

FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering

Deep Signature: Characterization of Large-Scale Molecular Dynamics

State-of-the-Art Periorbital Distance Prediction and Disease Classification Using Periorbital Features

A Deep Learning Approach for Pixel-level Material Classification via Hyperspectral Imaging

Fragment-Masked Diffusion for Molecular Optimization

Public Constitutional AI

Cognitive Insights and Stable Coalition Matching for Fostering Multi-Agent Cooperation

DACAD: Domain Adaptation Contrastive Learning for Anomaly Detection in Multivariate Time Series

Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

Detecting Multimedia Generated by Large AI Models: A Survey

Efficient approximation of Earth Mover's Distance Based on Nearest Neighbor Search

EiHi Net: Out-of-Distribution Generalization Paradigm

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging

SafeMate: A Modular RAG-Based Agent for Context-Aware Emergency Guidance

UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning

AdaWorld: Learning Adaptable World Models with Latent Actions

Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming

An Analytical Emotion Framework of Rumour Threads on Social Media

Deontic Temporal Logic for Formal Verification of AI Ethics

PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment

Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

Exploiting Uncertainty for Querying Inconsistent Description Logics Knowledge Bases

Learning to Be Cautious

Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors

How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference

WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models

Variational Visual Question Answering

Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Meta-learning Slice-to-Volume Reconstruction in Fetal Brain MRI using Implicit Neural Representations

Learning Long-Context Diffusion Policies via Past-Token Prediction

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput

Preserving Plasticity in Continual Learning with Adaptive Linearity Injection

Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities

A 2D Semantic-Aware Position Encoding for Vision Transformers

Quantum state-agnostic work extraction (almost) without dissipation

Evaluating GPT- and Reasoning-based Large Language Models on Physics Olympiad Problems: Surpassing Human Performance and Implications for Educational Assessment

CXMArena: Unified Dataset to benchmark performance in realistic CXM Scenarios

Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

Multilingual Machine Translation with Quantum Encoder Decoder Attention-based Convolutional Variational Circuits

Quantum-Enhanced Parameter-Efficient Learning for Typhoon Trajectory Forecasting

UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units

FedSaaS: Class-Consistency Federated Semantic Segmentation via Global Prototype Supervision and Local Adversarial Harmonization

The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan

Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform

TensorRL-QAS: Reinforcement learning with tensor networks for scalable quantum architecture search

GreenFactory: Ensembling Zero-Cost Proxies to Estimate Performance of Neural Networks

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Evaluating the Robustness of Adversarial Defenses in Malware Detection Systems

BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis

Neural Video Compression using 2D Gaussian Splatting

Toward Fair Federated Learning under Demographic Disparities and Data Imbalance

MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning

Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt

Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation

EDBench: Large-Scale Electron Density Data for Molecular Modeling

Focus, Merge, Rank: Improved Question Answering Based on Semi-structured Knowledge Bases

Educational impacts of generative artificial intelligence on learning and performance of engineering students in China

InvDesFlow-AL: Active Learning-based Workflow for Inverse Design of Functional Materials

DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection

An Initial Exploration of Default Images in Text-to-Image Generation

A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder Learning

ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor

Fair Clustering via Alignment

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

Created by

Haebom

저자

Xinji Mai, Haotian Xu, Xing W, Weinong Wang, Yingying Zhang, Wenqiang Zhang

개요

본 논문은 강화학습(RL)을 통해 대규모 언어 모델(LLM)이 외부 도구(Python 코드 실행)를 자발적으로 활용하여 수학 문제 해결 능력을 향상시키는 ZeroTIR(Zero-shot Tool-Integrated Reasoning) 방법을 제시합니다. 감독된 도구 사용 예시 없이 결과 기반 보상으로 RL을 적용하여 LLM이 Python 코드를 생성하고 실행하도록 학습시키는 접근 방식입니다. 실험 결과, RL 학습 단계 증가에 따라 코드 실행 빈도, 응답 길이, 최종 정확도가 모두 증가하는 양의 상관관계를 보였으며, 이는 학습에 투입된 계산 노력과 효과적인 도구 활용 전략 습득 간의 정량적 관계를 시사합니다. ZeroTIR은 기존의 도구를 사용하지 않는 ZeroRL 기준 모델보다 수학 벤치마크에서 성능이 훨씬 우수함을 보여줍니다. 본 연구는 에이전트 RL에서 자율적인 도구 사용 습득 및 확장에 대한 기초적인 이해를 제공하고, 향후 연구를 위한 재현 가능한 벤치마크를 제공합니다.

시사점, 한계점

•

시사점:

◦

결과 기반 보상 RL을 통해 LLM이 외부 도구(Python 코드)를 자발적으로 사용하여 수학적 추론 능력을 향상시킬 수 있음을 보여줌.

◦

RL 학습 단계와 코드 실행 빈도, 응답 길이, 정확도 간의 양의 상관관계를 규명하여, 학습 노력과 도구 활용 전략 습득 간의 정량적 관계를 제시함.

◦

ZeroTIR이 기존 ZeroRL 기준 모델보다 성능이 우수함을 실험적으로 증명함.

◦

재현 가능한 벤치마크를 제공하여 향후 연구에 기여함.

•

한계점:

◦

현재 Python 코드 실행에 국한되어 다른 유형의 도구 사용으로의 일반화 가능성에 대한 추가 연구가 필요함.

◦

사용된 수학 벤치마크의 범위와 종류에 대한 명확한 설명이 부족함. 다양한 유형의 수학 문제에 대한 일반화 성능 평가가 더 필요함.

◦

RL 학습 과정에서 발생할 수 있는 과적합 및 안정성 문제에 대한 심층적인 분석이 부족함.

Made with Slashpage