Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

VGR: Visual Grounded Reasoning

Improving Multimodal Learning Balance and Sufficiency through Data Remixing

Autonomous Computer Vision Development with Agentic AI

A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data

Enabling On-Device Medical AI Assistants via Input-Driven Saliency Adaptation

Task-aligned prompting improves zero-shot detection of AI-generated images by Vision-Language Models

From Reasoning to Code: GRPO Optimization for Underrepresented Languages

Farseer: A Refined Scaling Law in Large Language Models

Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors

VideoDeepResearch: Long Video Understanding With Agentic Tool Using

Data Shifts Hurt CoT: A Theoretical Study

Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications

Extended Creativity: A Conceptual Framework for Understanding Human-AI Creative Relations

Disclosure Audits for LLM Agents

A Survey of Generative Categories and Techniques in Multimodal Large Language Models

Resa: Transparent Reasoning Models via SAEs

EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

Latent Multi-Head Attention for Small Language Models

Intra-Trajectory Consistency for Reward Modeling

Foundation Models in Medical Imaging -- A Review and Outlook

EdgeProfiler: A Fast Profiling Framework for Lightweight LLMs on Edge Using Analytical Model

Segment Concealed Objects with Incomplete Supervision

Inherently Faithful Attention Maps for Vision Transformers

DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs

Sparse Interpretable Deep Learning with LIES Networks for Symbolic Regression

Agentic Surgical AI: Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion in a Vision-Language-Action Framework

Reparameterized LLM Training via Orthogonal Equivalence Transformation

Lightweight Sequential Transformers for Blood Glucose Level Prediction in Type-1 Diabetes

SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis

Synthesize Privacy-Preserving High-Resolution Images via Private Textual Intermediaries

LeVo: High-Quality Song Generation with Multi-Preference Alignment

FAMSeg: Fetal Femur and Cranial Ultrasound Segmentation Using Feature-Aware Attention and Mamba Enhancement

SALT: A Lightweight Model Adaptation Method for Closed Split Computing Environments

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

Towards Physics-informed Diffusion for Anomaly Detection in Trajectories

SAFE: Finding Sparse and Flat Minima to Improve Pruning

Tactile MNIST: Benchmarking Active Tactile Perception

SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code

Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System

MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP

Deep Learning-Based Breast Cancer Detection in Mammography: A Multi-Center Validation Study in Thai Population

KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

From Argumentative Text to Argument Knowledge Graph: A New Framework for Structured Argumentation

An Incremental Framework for Topological Dialogue Semantics: Efficient Reasoning in Discrete Spaces

Diffusion Graph Neural Networks for Robustness in Olfaction Sensors and Datasets

COGNATE: Acceleration of Sparse Tensor Programs on Emerging Hardware using Transfer Learning

R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

Quantum computing and artificial intelligence: status and perspectives

Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks

On the performance of machine-learning-assisted Monte Carlo in sampling from simple statistical physics models

Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning

SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge

PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding

Transparency in Healthcare AI: Testing European Regulatory Provisions against Users' Transparency Needs

FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records

Oversmoothing, Oversquashing, Heterophily, Long-Range, and more: Demystifying Common Beliefs in Graph Machine Learning

Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision

Fixed Point Explainability

X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real

JAEGER: Dual-Level Humanoid Whole-Body Controller

AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents

AI Recommendations and Non-instrumental Image Concerns

Entropic Time Schedulers for Generative Diffusion Models

JEPA4Rec: Learning Effective Language Representations for Sequential Recommendation via Joint Embedding Predictive Architecture

Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design

Scholar Inbox: Personalized Paper Recommendations for Scientists

On Synthesizing Data for Context Attribution in Question Answering

Evaluating how LLM annotations represent diverse views on contentious topics

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions

ShED-HD: A Shannon Entropy Distribution Framework for Lightweight Hallucination Detection on Edge Devices

OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents

MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion

Transformers without Normalization

Compute Optimal Scaling of Skills: Knowledge vs Reasoning

Identifying Trustworthiness Challenges in Deep Learning Models for Continental-Scale Water Quality Prediction

InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model

From Euler to AI: Unifying Formulas for Mathematical Constants

C2-DPO: Constrained Controlled Direct Preference Optimization

Less is More: Improving LLM Alignment via Preference Data Selection

Quantifying Memorization and Parametric Response Rates in Retrieval-Augmented Vision-Language Models

ProMedTS: A Self-Supervised, Prompt-Guided Multimodal Approach for Integrating Medical Text and Time Series

Boosting Generalization in Diffusion-Based Neural Combinatorial Solver via Inference Time Adaptation

HARBOR: Exploring Persona Dynamics in Multi-Agent Competition

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Truth Knows No Language: Evaluating Truthfulness Beyond English

PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation

Multi-Knowledge-oriented Nighttime Haze Imaging Enhancer for Vision-driven Intelligent Systems

Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification

Is attention all you need to solve the correlated electron problem?

Optimizing Temperature for Language Models with Multi-Sample Inference

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

The Other Side of the Coin: Unveiling the Downsides of Model Aggregation in Federated Learning from a Layer-peeled Perspective

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Activation-Informed Merging of Large Language Models

Layer by Layer: Uncovering Hidden Representations in Language Models

Search-Based Adversarial Estimates for Improving Sample Efficiency in Off-Policy Reinforcement Learning

Activation by Interval-wise Dropout: A Simple Way to Prevent Neural Networks from Plasticity Loss

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Created by

Haebom

저자

Maohao Shen, Guangtao Zeng, Zhenting Qi, Zhang-Wei Hong, Zhenfang Chen, Wei Lu, Gregory Wornell, Subhro Das, David Cox, Chuang Gan

개요

본 논문은 대규모 언어 모델(LLM)의 추론 능력 향상을 위한 새로운 접근 방식을 제시합니다. 기존 연구들이 추론 시 외부 검증자 LLM을 활용한 다단계 샘플링을 통해 추론 능력을 향상시킨 것과 달리, 본 논문은 단일 LLM 내부적으로 자가 반성 및 전략 탐색을 통한 자동 회귀적 탐색 기능을 내재화하는 데 초점을 맞춥니다. 이를 위해 '행동-사고 연쇄(Chain-of-Action-Thought, COAT)' 추론과 2단계 학습 방식(소규모 형식 조정 및 대규모 자기 개선 단계)을 제안합니다. 70억 매개변수 규모의 오픈소스 모델 Satori를 개발하여 수학적 추론 벤치마크에서 최첨단 성능을 달성하고, 도메인 외 과제에도 우수한 일반화 성능을 보임을 실험적으로 검증했습니다. 코드, 데이터 및 모델은 모두 공개되었습니다.

시사점, 한계점

•

시사점:

◦

단일 LLM 내부에서 자가 반성 및 전략 탐색을 통한 추론 능력 향상 가능성을 제시.

◦

COAT 추론 및 2단계 학습 방식의 효과성을 실험적으로 검증.

◦

수학적 추론 및 도메인 외 과제에서 최첨단 성능 달성.

◦

오픈소스로 공개되어 연구 및 개발에 기여.

•

한계점:

◦

Satori 모델의 규모(70억 매개변수)가 다른 최첨단 모델들에 비해 상대적으로 작을 수 있음.

◦

특정 벤치마크에 대한 성능 개선에 초점을 맞추었으므로, 다른 유형의 추론 과제에 대한 일반화 성능은 추가 연구가 필요함.

◦

2단계 학습 방식의 복잡성으로 인해 학습 비용이 높을 수 있음.

Made with Slashpage