Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Semantic Content Determines Algorithmic Performance

Beyond Imitation: Reinforcement Learning for Active Latent Planning

Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves

Chain Of Thought Compression: A Theoritical Analysis

EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots

Meta Context Engineering via Agentic Skill Evolution

ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory

ARGORA: Orchestrated Argumentation for Causally Grounded LLM Reasoning and Decision Making

KAPSO: A Knowledge-grounded framework for Autonomous Program Synthesis and Optimization

LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI

The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation

MAR: Efficient Large Language Models via Module-aware Architecture Refinement

The Path of Least Resistance: Guiding LLM Reasining Trajectories with Prefix Consensus

ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

Topeax -- An Improved Clustering Topic Model with Density Peak Detection and Lexical-Semantic Term Importance

LION: A Clifford Neural Paradigm for Multimodal-Attributed Graph Learning

ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design

The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making

When Prohibitions Become Permissions: Auditing Negation Sensitivity in Language Models

System 1&2 Synergy via Dynamic Model Interpolation

DataCross: A Unified Benchmark and Agent Framework for Cross-Modal Heterogeneous Data Analysis

TeachBench: A Syllabus-Grounded Framework for Evaluating Teaching Ability in Large Language Models

NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents

Hebbian Learning with Global Direction

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI Agents

Dynamic Framework for Collaborative Learning: Leveraging Advanced LLM with Adaptive Feedback Mechanisms

Ostrakon-VL: Towards Domain-Expert MLLM for Food-Service and Retail Stores

EHR-RAG: Bridging Long-Horizon Structured Electronic Health Records and Large Language Models via Enhanced Retrieval-Augmented Generation

Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks

Modeling Endogenous Logic: Causal Neuro-Symbolic Reasoning Model for Explainable Multi-Behavior Recommendation

White-Box Op-Amp Design via Human-Mimicking Reasoning

Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving

Position: Certifiable State Integrity in Cyber-Physical Systems -- Why Modular Sovereignty Solves the Plasticity-Stability Paradox

TIDE: Tuning-Integrated Dynamic Evolution for LLM-Based Automated Heuristic Design

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs

Delegation Without Living Governance

Causal Discovery for Explainable AI: A Dual-Encoding Approach

Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning

Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification

When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning

Do Reasoning Models Enhance Embedding Models?

Sycophantic Anchors: Localizing and Quantifying User Agreement in Reasoning Models

MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models

FrontierScience: Evaluating AI's Ability to Perform Expert-Level Scientific Tasks

Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving

Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning

BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding

What You Feel Is Not What They See: On Predicting Self-Reported Emotion from Third-Party Observer Labels

Beyond a Single Reference: Training and Evaluation with Paraphrases in Sign Language Translation

CUA-Skill: Develop Skills for Computer Using Agent

Planner-Auditor Twin: Agentic Discharge Planning with FHIR-Based LLM Planning, Guideline Recall, Optional Caching and Self-Improvement

How does information access affect LLM monitors' ability to detect sabotage?

Magellan: Autonomous Discovery of Novel Compiler Optimization Heuristics with AlphaEvolve

Responsible AI: The Good, The Bad, The AI

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Multi-modal Imputation for Alzheimer's Disease Classification

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report

QUARK: Robust Retrieval under Non-Faithful Queries via Query-Anchored Aggregation

Unplugging a Seemingly Sentient Machine Is the Rational Choice -- A Metaphysical Perspective

Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models

The Epistemic Planning Domain Definition Language: Official Guideline

Do LLMs Favor LLMs? Quantifying Interaction Effects in Peer Review

Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation

CiMRAG: Cim-Aware Domain-Adaptive and Noise-Resilient Retrieval-Augmented Generation for Edge-Based LLMs

Structural Compositional Function Networks: Interpretable Functional Compositions for Tabular Discovery

LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?

On the Effectiveness of LLM-Specific Fine-Tuning for Detecting AI-Generated Text

Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

Do we really need Self-Attention for Streaming Automatic Speech Recognition?

VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models

Probabilistic Sensing: Intelligence in Data Sampling

LTS-VoiceAgent: A Listen-Think-Speak Framework for Efficient Streaming Voice Interaction via Semantic Triggering and Incremental Reasoning

NCSAM Noise-Compensated Sharpness-Aware Minimization for Noisy Label Learning

Benchmarking von ASR-Modellen im deutschen medizinischen Kontext: Eine Leistungsanalyse anhand von Anamnesegespr\"achen

Bench4HLS: End-to-End Evaluation of LLMs in High-Level Synthesis Code Generation

Continuous-Flow Data-Rate-Aware CNN Inference on FPGA

DecHW: Heterogeneous Decentralized Federated Learning Exploiting Second-Order Information

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

Mem2ActBench: A Benchmark for Evaluating Long-Term Memory Utilization in Task-Oriented Autonomous Agents

Quantifying non deterministic drift in large language models

Text-to-State Mapping for Non-Resolution Reasoning: The Contradiction-Preservation Principle

SDUs DAISY: A Benchmark for Danish Culture

Stingy Context: 18:1 Hierarchical Code Compression for LLM Auto-Coding

The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models

Evaluating Large Language Models for Abstract Evaluation Tasks: An Empirical Study

OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling

Table-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation

HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue

Demystifying Multi-Agent Debate: The Role of Confidence and Diversity

FastWhisper: Adaptive Self-knowledge Distillation for Real-time Automatic Speech Recognition

Modeling Next-Token Prediction as Left-Nested Intuitionistic Implication

Simulating Complex Multi-Turn Tool Calling Interactions in Stateless Execution Environments

From Intuition to Expertise: Rubric-Based Cognitive Calibration for Human Detection of LLM-Generated Korean Text

Analysis of LLM Vulnerability to GPU Soft Errors: An Instruction-Level Fault Injection Study

GTAC: A Generative Transformer for Approximate Circuits

DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs

STELLAR: Structure-guided LLM Assertion Retrieval and Generation for Formal Verification

System 1&2 Synergy via Dynamic Model Interpolation

Created by

Haebom

Category

Empty

저자

Chenxu Yang, Qingyi Si, Chong Tian, Xiyu Liu, Dingyu Yao, Chuanyu Qin, Zheng Lin, Weiping Wang, Jiaqi Wang

💡 개요

본 논문은 직관적인 System 1과 사려 깊은 System 2의 인지 모드 간 간섭으로 인해 두 가지를 통합하는 언어 모델 훈련의 어려움을 해결합니다. 기존의 System 2 모델 효율성 향상 접근 방식이 결과 제어에만 집중하여 한계를 보인다는 점을 지적하며, 모델의 사고 방식을 조절하는 '역량 제어'로 초점을 이동합니다. 이를 위해 추가 훈련 없이 동적 모델 보간법을 활용하여 쿼리별로 인지 깊이를 조절하는 DAMI 프레임워크를 제안합니다.

🔑 시사점 및 한계

•

모델의 '무엇을 생산하는가'가 아닌 '어떻게 생각하는가'를 조절하는 역량 제어의 중요성을 제시합니다.

•

기존 모델을 추가 훈련 없이 동적 모델 보간법으로 통합하여 System 1의 효율성과 System 2의 추론 능력을 효과적으로 결합하는 DAMI 프레임워크를 제안합니다.

•

학습 기반 추정 및 제로샷 배포를 위한 구체적인 방법론을 제시하여 실용성을 높였습니다.

•

본 연구는 파일럿 연구이며, 더 광범위한 벤치마크 및 복잡한 추론 작업에 대한 추가적인 검증이 필요합니다.

Made with Slashpage