Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation

VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning

UI-UG: A Unified MLLM for UI Understanding and Generation

Q-Mirror: Unlocking the Multi-Modal Potential of Scientific Text-Only QA Pairs

Conda: Column-Normalized Adam for Training Large Language Models Faster

TENET: Leveraging Tests Beyond Validation for Code Generation

FameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning

Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse

MMPB: It's Time for Multi-Modal Personalization

Painless Activation Steering: An Automated, Lightweight Approach for Post-Training Large Language Models

A Meta-Analysis of LLM Effects on Students across Qualification, Socialisation, and Subjectification

Wavelet-Induced Rotary Encodings: RoPE Meets Graphs

Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models

Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking

Predicting LLM Reasoning Performance with Small Proxy Model

Beyond the Individual: Introducing Group Intention Forecasting with SHOT Dataset

Adversarial Defense in Cybersecurity: A Systematic Review of GANs for Threat Detection and Mitigation

Video models are zero-shot learners and reasoners

Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization

U-Mamba2-SSL for Semi-Supervised Tooth and Pulp Segmentation in CBCT

Graph Coloring for Multi-Task Learning

KANO: Kolmogorov-Arnold Neural Operator

Robust LLM Training Infrastructure at ByteDance

Communications to Circulations: 3D Wind Field Retrieval and Real-Time Prediction Using 5G GNSS Signals and Deep Learning

FlowRL: Matching Reward Distributions for LLM Reasoning

DreamControl: Human-Inspired Whole-Body Humanoid Control for Scene Interaction via Guided Diffusion

Multi-Robot Task Planning for Multi-Object Retrieval Tasks with Distributed On-Site Knowledge via Large Language Models

U-Mamba2: Scaling State Space Models for Dental Anatomy Segmentation in CBCT

MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs

Inducing Uncertainty on Open-Weight Models for Test-Time Privacy in Image Recognition

Ban&Pick: Ehancing Performance and Efficiency of MoE-LLMs via Smarter Routing

LiDAR-BIND-T: Improved and Temporally Consistent Sensor Modality Translation and Fusion for Robotic Applications

Long-Horizon Visual Imitation Learning via Plan and Code Reflection

Measuring the Measures: Discriminative Capacity of Representational Similarity Metrics Across Model Families

Learning to Generate Unit Test via Adversarial Reinforcement Learning

Diffusion Language Models Know the Answer Before Decoding

Object Detection with Multimodal Large Vision-Language Models: An In-depth Review

Image-Conditioned 3D Gaussian Splat Quantization

The DNA of nuclear models: How AI predicts nuclear masses

FoundBioNet: A Foundation-Based Model for IDH Genotyping of Glioma from Multi-Parametric MRI

Learning Unified User Quantized Tokenizers for User Representation

A Survey on Code Generation with LLM-based Agents

The Ever-Evolving Science Exam

The Impact of Language Mixing on Bilingual LLM Reasoning

Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations

Linguistic and Embedding-Based Profiling of Texts generated by Humans and Large Language Models

QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design

Scaling RL to Long Videos

On the Effectiveness of Methods and Metrics for Explainable AI in Remote Sensing Image Scene Classification

Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training

HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding

LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection

Deep Graph Learning for Industrial Carbon Emission Analysis and Policy Impact

DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift

SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions

When Does Multimodality Lead to Better Time Series Forecasting?

FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation

Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models

QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety

VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision-Language Models

A theoretical framework for self-supervised contrastive learning for continuous dependent data

Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-$k$

Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement

Static Word Embeddings for Sentence Semantic Representation

Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation

Multi Layered Autonomy and AI Ecologies in Robotic Art Installations

WorldGym: World Model as An Environment for Policy Evaluation

Personalized Subgraph Federated Learning with Differentiable Auxiliary Projections

ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models

Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features

SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?

Value-Guided Search for Efficient Chain-of-Thought Reasoning

LLM Agents for Interactive Exploration of Historical Cadastre Data: Framework and Application to Venice

Find the Fruit: Zero-Shot Sim2Real RL for Occlusion-Aware Plant Manipulation

AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

Causal Interventions Reveal Shared Structure Across English Filler-Gap Constructions

DEBATE, TRAIN, EVOLVE: Self Evolution of Language Model Reasoning

Octic Vision Transformers: Quicker ViTs Through Equivariance

Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries

ELEPHANT: Measuring and understanding social sycophancy in LLMs

Structured Agent Distillation for Large Language Model

ScSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

Modeling Saliency Dataset Bias

TensorRL-QAS: Reinforcement learning with tensor networks for improved quantum architecture search

Scalable LLM Math Reasoning Acceleration with Low-rank Distillation

Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization

Stochastic Layer-wise Learning: Scalable and Efficient Alternative to Backpropagation

Fair Uncertainty Quantification for Depression Prediction

Adaptive Rectification Sampling for Test-Time Compute Scaling

Lobster: A GPU-Accelerated Framework for Neurosymbolic Programming

Enabling Rapid Shared Human-AI Mental Model Alignment via the After-Action Review

CODA: Repurposing Continuous VAEs for Discrete Tokenization

Value Profiles for Encoding Human Variation

FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization

A Survey on SAR ship classification using Deep Learning

Revisiting semi-supervised learning in the era of foundation models

Rethinking Diffusion Model in High Dimension

Structured Agent Distillation for Large Language Model

Created by

Haebom

作者

Jun Liu, Zhenglun Kong, Peiyan Dong, Changdi Yang, Tianqi Li, Hao Tang, Geng Yuan, Wei Niu, Wenbin Zhang, Pu Zhao, Xue Lin, Dong Huang, Yanzhi Wang

Structured Agent Distillation

概要

大規模言語モデル（LLM）ベースのエージェントは、推論と行動を組み合わせて強力な意思決定能力を実証します。しかし、高い推論コストと大きなモデルサイズのため、実際の展開に制約があります。この論文では、大規模なLLMベースのエージェントをより小さな学生モデルに圧縮しながら、推論の忠実度と行動の一貫性を維持するフレームワークであるStructured Agent Distillationを提案します。トークンレベルの標準蒸留とは異なり、本方法は、軌道を{[REASON]}および{[ACT]}区間に分割し、各構成要素を教師の行動に合わせるためにセグメントごとの損失を適用する。この構造認識監督により、小型エージェントは教師の意思決定プロセスをよりよく複製することができます。 ALFWorld、HotPotQA-ReAct、WebShopの実験では、この研究はトークンレベルと模倣学習ベースラインを一貫して上回り、パフォーマンスの低下を最小限に抑えながらかなりの圧縮を達成しました。スケーリングと除去の結果は、効率的で配布可能なエージェントのための区間レベルアラインメントの重要性を強調します。

Takeaways、Limitations

•

大規模LLMベースのエージェントの推論能力と行動の一貫性を維持しながらモデルサイズを縮小する効果的な方法を提示

•

ReActスタイルのフレームワークで高いパフォーマンスを達成し、トークンレベルと模倣学習方法よりも優れています。

•

ALFWorld、HotPotQA-ReAct、WebShopなど、さまざまな環境での実験を通じて、方法論の一般性を検証。

•

Span-level alignmentの重要性を強調し、効率的なエージェント開発に貢献。

•

具体的なモデルアーキテクチャや実装の詳細についての深い議論の欠如。

•

特定の作業環境に特化した最適化が必要な場合があります。

•

実際の展開時に発生する可能性がある追加の問題（例えば、ラテンシー）の分析の欠如。

Made with Slashpage