Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Accurate and scalable exchange-correlation with deep learning

AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation

Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models

Aligning Evaluation with Clinical Priorities: Calibration, Label Shift, and Error Costs

GRAM: A Generative Foundation Reward Model for Reward Generalization

VideoMAR: Autoregressive Video Generatio with Continuous Tokens

FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation

Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models

No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need!

Serving Large Language Models on Huawei CloudMatrix384

PLD: A Choice-Theoretic List-Wise Knowledge Distillation

TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy

Refactoring Codebases through Library Design

TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding

Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

Multi-Task Reward Learning from Human Ratings

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Vision Transformers Don't Need Trained Registers

BIS Reasoning 1.0: The First Large-Scale Japanese Benchmark for Belief-Inconsistent Syllogistic Reasoning

LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles

CORA: Coalitional Rational Advantage Decomposition for Multi-Agent Policy Gradients

Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications

ChemHAS: Hierarchical Agent Stacking for Enhancing Chemistry Tools

ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

Think Twice before Adaptation: Improving Adaptability of DeepFake Detection via Online Test-Time Adaptation

Efficient Long CoT Reasoning in Small Language Models

Imagine Beyond! Distributionally Robust Auto-Encoding for State Space Coverage in Online Reinforcement Learning

MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion

J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization

Fractured Chain-of-Thought Reasoning

DreamGen: Unlocking Generalization in Robot Learning through Video World Models

UD-English-CHILDES: A Collected Resource of Gold and Silver Universal Dependencies Trees for Child Language Interactions

Position Paper: Rethinking Privacy in RL for Sequential Decision-making in the Age of LLMs

Influential Bandits: Pulling an Arm May Change the Environment

SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models

Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning

Exploring Personalized Federated Learning Architectures for Violence Detection in Surveillance Videos

A Bird Song Detector for improving bird identification through Deep Learning: a case study from Do\~nana

KANITE: Kolmogorov-Arnold Networks for ITE estimation

Beyond Propagation of Chaos: A Stochastic Algorithm for Mean Field Optimization

Resolving UnderEdit & OverEdit with Iterative & Neighbor-Assisted Model Editing

Adding Chocolate to Mint: Mitigating Metric Interference in Machine Translation

EgoBlind: Towards Egocentric Visual Assistance for the Blind

PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice

Machine Learners Should Acknowledge the Legal Implications of Large Language Models as Personal Data

Supporting the development of Machine Learning for fundamental science in a federated Cloud with the AI_INFN platform

CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions

Perspective Transition of Large Language Models for Solving Subjective Tasks

Can LLMs Ask Good Questions?

Aligning AI Research with the Needs of Clinical Coding Workflows: Eight Recommendations Based on US Data Analysis and Critical Review

SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation

Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition

Multiclass Post-Earthquake Building Assessment Integrating High-Resolution Optical and SAR Satellite Imagery, Ground Motion, and Soil Data with Transformers

REVOLVE: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Heterogeneous Relationships of Subjects and Shapelets for Semi-supervised Multivariate Series Classification

Contrast Similarity-Aware Dual-Pathway Mamba for Multivariate Time Series Node Classification

Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation

LL\"aMmlein: Transparent, Compact and Competitive German-Only Language Models from Scratch

Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes

The Epochal Sawtooth Phenomenon: Unveiling Training Loss Oscillations in Adam and Other Optimizers

Pap2Pat: Benchmarking Outline-Guided Long-Text Patent Generation with Patent-Paper Pairs

Deep Graph Anomaly Detection: A Survey and New Perspectives

A Novel Perturb-ability Score to Mitigate Evasion Adversarial Attacks on Flow-Based ML-NIDS

Style-Preserving Lip Sync via Audio-Aware Style Reference

Advancing oncology with federated learning: transcending boundaries in breast, lung, and prostate cancer. A systematic review

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Informed Correctors for Discrete Diffusion Models

RadioRAG: Online Retrieval-augmented Generation for Radiology Question Answering

A Systematic Survey of Natural Language Processing for the Greek Language

Predicting the Understandability of Computational Notebooks through Code Metrics Analysis

An Effective Incorporating Heterogeneous Knowledge Curriculum Learning for Sequence Labeling

HiURE: Hierarchical Exemplar Contrastive Learning for Unsupervised Relation Extraction

The NordDRG AI Benchmark for Large Language Models

From Data-Driven to Purpose-Driven Artificial Intelligence: Systems Thinking for Data-Analytic Automation of Patient Care

Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?

Entropy-based Exploration Conduction for Multi-step Reasoning

Solving Satisfiability Modulo Counting Exactly with Probabilistic Circuits

Synthesizing Composite Hierarchical Structure from Symbolic Music Corpora

Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization

Optimal Transport for Probabilistic Circuits

OM4OV: Leveraging Ontology Matching for Ontology Versioning

Behaviour Planning: A Toolkit for Diverse Planning

Spatial Context-based Self-Supervised Learning for Handwritten Text Recognition

"Generate" the Future of Work through AI: Empirical Evidence from Online Labor Markets

Dense SAE Latents Are Features, Not Bugs

Sekai: A Video Dataset towards World Exploration

Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers

AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Federated Learning for MRI-based BrainAGE: a multicenter study on post-stroke functional outcome prediction

GFLC: Graph-based Fairness-aware Label Correction for Fair Classification

The Compositional Architecture of Regret in Large Language Models

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Created by

Haebom

저자

Haozhen Zhang, Tao Feng, Jiaxuan You

개요

본 논문은 다양한 대규모 언어 모델(LLM) 라우터의 한계를 극복하기 위해 강화 학습(RL) 기반의 새로운 프레임워크인 Router-R1을 제시합니다. 기존의 LLM 라우터는 각 질의를 단일 모델에 매핑하는 단순한 방식을 사용하지만, Router-R1은 여러 LLM을 순차적으로 활용하여 복잡한 작업을 처리합니다. Router-R1은 LLM 자체를 라우터로 활용하여 내부적인 사고 과정과 모델 호출을 번갈아 수행하며, 각 응답을 진화하는 컨텍스트에 통합합니다. 효율적인 학습을 위해 형식 보상, 최종 결과 보상, 그리고 비용 보상을 포함하는 경량 규칙 기반 보상을 사용하여 성능과 비용 간의 균형을 최적화합니다. 또한, 가격, 지연 시간, 예시 성능과 같은 간단한 모델 설명자만을 조건으로 사용하여, 보이지 않는 모델 선택에 대한 강력한 일반화 성능을 보입니다. 7개의 일반 및 다단계 질의응답 벤치마크에 대한 실험 결과, Router-R1은 여러 강력한 기준 모델보다 우수한 성능을 달성하면서 견고한 일반화 및 비용 관리를 유지합니다.

시사점, 한계점

•

시사점:

◦

강화학습 기반의 다중 LLM 라우팅 및 집계를 통해 복잡한 작업에 대한 성능 향상 가능성 제시.

◦

LLM 자체를 라우터로 활용하여 추론 능력을 활용한 지능적인 라우팅 전략 제시.

◦

경량 규칙 기반 보상 시스템을 통해 성능-비용 간의 효율적인 균형 유지.

◦

간단한 모델 설명자만을 사용하여 일반화 성능 향상.

◦

다양한 벤치마크에서 기존 방법 대비 우수한 성능 입증.

•

한계점:

◦

제안된 보상 시스템의 일반성 및 다른 작업에 대한 적용 가능성에 대한 추가적인 연구 필요.

◦

사용된 모델 설명자의 한계와 더욱 풍부한 정보를 활용하는 방안에 대한 고찰 필요.

◦

실제 상용 환경에서의 확장성 및 안정성 평가 필요.

◦

특정 벤치마크에 국한된 실험 결과의 일반화 가능성에 대한 추가 연구 필요.

Made with Slashpage