Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

AutoChemSchematic AI: A Closed-Loop, Physics-Aware Agentic Framework for Auto-Generating Chemical Process and Instrumentation Diagrams

LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Training

DLP: Dynamic Layerwise Pruning in Large Language Models

MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

Estimating LLM Consistency: A User Baseline vs Surrogate Metrics

Mind the Gap: A Practical Attack on GGUF Quantization

Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time

GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents

Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation

Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles

Cognitive Guardrails for Open-World Decision Making in Autonomous Drone Swarms

SWE-bench Goes Live!

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

Matryoshka Model Learning for Improved Elastic Student Models

Context-Robust Knowledge Editing for Language Models

HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding

FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian

FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control

Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models

iDSE: Navigating Design Space Exploration in High-Level Synthesis Using LLMs

Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection

RePaViT: Scalable Vision Transformer Acceleration via Structural Reparameterization on Feedforward Network Layers

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Hume: Introducing System-2 Thinking in Visual-Language-Action Model

Universal Value-Function Uncertainties

How Do Transformers Learn Variable Binding in Symbolic Programs?

Adversarial bandit optimization for approximately linear functions

Hierarchical Retrieval with Evidence Curation for Open-Domain Financial Question Answering on Standardized Documents

Homophily Enhanced Graph Domain Adaptation

NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering

SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline

An Interpretable Representation Learning Approach for Diffusion Tensor Imaging

InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts

Moderating Harm: Benchmarking Large Language Models for Cyberbullying Detection in YouTube Comments

Security Concerns for Large Language Models: A Survey

A Survey of LLM $\times$ DATA

Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain

NMCSE: Noise-Robust Multi-Modal Coupling Signal Estimation Method via Optimal Transport for Cardiovascular Disease Detection

FRIREN: Beyond Trajectories -- A Spectral Lens on Time

A Fully Generative Motivational Interviewing Counsellor Chatbot for Moving Smokers Towards the Decision to Quit

TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling

Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection

Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

Replay Attacks Against Audio Deepfake Detection

Forensic deepfake audio detection using segmental speech features

A3 : an Analytical Low-Rank Approximation Framework for Attention

A Survey of 3D Reconstruction with Event Cameras

Confabulation dynamics in a reservoir computer: Filling in the gaps with untrained attractors

WATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales

Deep Learning Framework for Infrastructure Maintenance: Crack Detection and High-Resolution Imaging of Infrastructure Surfaces

GAME: Learning Multimodal Interactions via Graph Structures for Personality Trait Estimation

LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection

Motion-compensated cardiac MRI using low-rank diffeomorphic flow (DMoCo)

OODTE: A Differential Testing Engine for the ONNX Optimizer

Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization

Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video

Depth-Constrained ASV Navigation with Deep RL and Limited Sensing

(Im)possibility of Automated Hallucination Detection in Large Language Models

Unveiling the Lack of LVLM Robustness to Fundamental Visual Variations: Why and Path Forward

Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research

The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs

The Structural Safety Generalization Problem

Parameterized Synthetic Text Generation with SimpleStories

SD$^2$: Self-Distilled Sparse Drafters

Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games

Semantic-guided Representation Learning for Multi-Label Recognition

SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement

Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation

A Conformal Risk Control Framework for Granular Word Assessment and Uncertainty Calibration of CLIPScore Quality Estimates

Understanding Inequality of LLM Fact-Checking over Geographic Regions with Agent and Retrieval models

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

A Survey on Event-driven 3D Reconstruction: Development under Different Categories

VecTrans: Enhancing Compiler Auto-Vectorization through LLM-Assisted Code Transformations

REALM: A Dataset of Real-World LLM Use Cases

Opportunities and Challenges of Frontier Data Governance With Synthetic Data

ARFlow: Human Action-Reaction Flow Matching with Physical Guidance

Position: Beyond Assistance - Reimagining LLMs as Ethical and Adaptive Co-Creators in Mental Health Care

Redefining Toxicity: An Objective and Context-Aware Approach for Stress-Level-Based Detection

A Dual-Directional Context-Aware Test-Time Learning for Text Classification

MentalChat16K: A Benchmark Dataset for Conversational Mental Health Assistance

ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning

FMNet: Frequency-Assisted Mamba-Like Linear Attention Network for Camouflaged Object Detection

NFIG: Autoregressive Image Generation with Next-Frequency Prediction

GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification

CSTRL: Context-Driven Sequential Transfer Learning for Abstractive Radiology Report Summarization

Wanda++: Pruning Large Language Models via Regional Gradients

HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation

Optimizing Multi-Hop Document Retrieval Through Intermediate Representations

Causally Reliable Concept Bottleneck Models

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models

Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models

POPGym Arcade: Parallel Pixelated POMDPs

Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems

Quantifying First-Order Markov Violations in Noisy Reinforcement Learning: A Causal Discovery Approach

SPD: Sync-Point Drop for Efficient Tensor Parallelism of Large Language Models

Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge Bases

Faithful Logic Embeddings in HOL -- Deep and Shallow

TestNUC: Enhancing Test-Time Computing Approaches and Scaling through Neighboring Unlabeled Data Consistency

Autonomy-of-Experts Models

Created by

Haebom

저자

Ang Lv, Ruobing Xie, Yining Qian, Songhao Wu, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan

개요

본 논문은 기존 Mixture-of-Experts (MoE) 모델의 라우터와 전문가 모듈 간의 분리로 인한 비효율적인 전문가 선택 및 학습 문제를 지적하고, 이를 해결하기 위해 새로운 MoE 패러다임인 Autonomy-of-Experts (AoE)를 제안합니다. AoE는 전문가가 자신의 처리 능력을 스스로 평가하여 입력을 처리할지 여부를 결정하는 방식입니다. 라우터를 제거하고, 전문가가 입력에 대한 내부 활성화를 미리 계산하여 활성화 규범에 따라 순위를 매긴 후 상위 전문가만 처리를 진행합니다. 저차원 가중치 분해를 통해 미리 계산하는 오버헤드를 줄였으며, 7억에서 40억 개의 파라미터를 가진 언어 모델을 사전 훈련하여 기존 MoE 모델보다 효율성이 향상됨을 보였습니다.

시사점, 한계점

•

시사점:

◦

기존 MoE 모델의 라우터 의존성 문제를 해결하는 새로운 접근 방식 제시

◦

전문가의 자율적인 입력 선택을 통해 전문가 선택 및 학습 효율 향상

◦

저차원 가중치 분해를 통한 계산 비용 감소

◦

대규모 언어 모델에서의 성능 향상 확인

•

한계점:

◦

AoE의 효과가 모든 유형의 데이터 및 모델 아키텍처에서 일반화되는지에 대한 추가 연구 필요

◦

저차원 가중치 분해의 차원 축소 정도가 모델 성능에 미치는 영향에 대한 심층적인 분석 필요

◦

전문가의 자기 평가 정확도에 대한 검증 및 개선 필요

◦

특정 작업이나 데이터셋에 대한 최적의 활성화 규범 결정 방법에 대한 추가 연구 필요

Made with Slashpage