Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Semantic Content Determines Algorithmic Performance

Beyond Imitation: Reinforcement Learning for Active Latent Planning

Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves

Chain Of Thought Compression: A Theoritical Analysis

EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots

Meta Context Engineering via Agentic Skill Evolution

ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory

ARGORA: Orchestrated Argumentation for Causally Grounded LLM Reasoning and Decision Making

KAPSO: A Knowledge-grounded framework for Autonomous Program Synthesis and Optimization

LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI

The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation

MAR: Efficient Large Language Models via Module-aware Architecture Refinement

The Path of Least Resistance: Guiding LLM Reasining Trajectories with Prefix Consensus

ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

Topeax -- An Improved Clustering Topic Model with Density Peak Detection and Lexical-Semantic Term Importance

LION: A Clifford Neural Paradigm for Multimodal-Attributed Graph Learning

ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design

The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making

When Prohibitions Become Permissions: Auditing Negation Sensitivity in Language Models

System 1&2 Synergy via Dynamic Model Interpolation

DataCross: A Unified Benchmark and Agent Framework for Cross-Modal Heterogeneous Data Analysis

TeachBench: A Syllabus-Grounded Framework for Evaluating Teaching Ability in Large Language Models

NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents

Hebbian Learning with Global Direction

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI Agents

Dynamic Framework for Collaborative Learning: Leveraging Advanced LLM with Adaptive Feedback Mechanisms

Ostrakon-VL: Towards Domain-Expert MLLM for Food-Service and Retail Stores

EHR-RAG: Bridging Long-Horizon Structured Electronic Health Records and Large Language Models via Enhanced Retrieval-Augmented Generation

Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks

Modeling Endogenous Logic: Causal Neuro-Symbolic Reasoning Model for Explainable Multi-Behavior Recommendation

White-Box Op-Amp Design via Human-Mimicking Reasoning

Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving

Position: Certifiable State Integrity in Cyber-Physical Systems -- Why Modular Sovereignty Solves the Plasticity-Stability Paradox

TIDE: Tuning-Integrated Dynamic Evolution for LLM-Based Automated Heuristic Design

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs

Delegation Without Living Governance

Causal Discovery for Explainable AI: A Dual-Encoding Approach

Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning

Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification

When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning

Do Reasoning Models Enhance Embedding Models?

Sycophantic Anchors: Localizing and Quantifying User Agreement in Reasoning Models

MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models

FrontierScience: Evaluating AI's Ability to Perform Expert-Level Scientific Tasks

Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving

Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning

BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding

What You Feel Is Not What They See: On Predicting Self-Reported Emotion from Third-Party Observer Labels

Beyond a Single Reference: Training and Evaluation with Paraphrases in Sign Language Translation

CUA-Skill: Develop Skills for Computer Using Agent

Planner-Auditor Twin: Agentic Discharge Planning with FHIR-Based LLM Planning, Guideline Recall, Optional Caching and Self-Improvement

How does information access affect LLM monitors' ability to detect sabotage?

Magellan: Autonomous Discovery of Novel Compiler Optimization Heuristics with AlphaEvolve

Responsible AI: The Good, The Bad, The AI

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Multi-modal Imputation for Alzheimer's Disease Classification

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report

QUARK: Robust Retrieval under Non-Faithful Queries via Query-Anchored Aggregation

Unplugging a Seemingly Sentient Machine Is the Rational Choice -- A Metaphysical Perspective

Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models

The Epistemic Planning Domain Definition Language: Official Guideline

Do LLMs Favor LLMs? Quantifying Interaction Effects in Peer Review

Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation

CiMRAG: Cim-Aware Domain-Adaptive and Noise-Resilient Retrieval-Augmented Generation for Edge-Based LLMs

Structural Compositional Function Networks: Interpretable Functional Compositions for Tabular Discovery

LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?

On the Effectiveness of LLM-Specific Fine-Tuning for Detecting AI-Generated Text

Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

Do we really need Self-Attention for Streaming Automatic Speech Recognition?

VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models

Probabilistic Sensing: Intelligence in Data Sampling

LTS-VoiceAgent: A Listen-Think-Speak Framework for Efficient Streaming Voice Interaction via Semantic Triggering and Incremental Reasoning

NCSAM Noise-Compensated Sharpness-Aware Minimization for Noisy Label Learning

Benchmarking von ASR-Modellen im deutschen medizinischen Kontext: Eine Leistungsanalyse anhand von Anamnesegespr\"achen

Bench4HLS: End-to-End Evaluation of LLMs in High-Level Synthesis Code Generation

Continuous-Flow Data-Rate-Aware CNN Inference on FPGA

DecHW: Heterogeneous Decentralized Federated Learning Exploiting Second-Order Information

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

Mem2ActBench: A Benchmark for Evaluating Long-Term Memory Utilization in Task-Oriented Autonomous Agents

Quantifying non deterministic drift in large language models

Text-to-State Mapping for Non-Resolution Reasoning: The Contradiction-Preservation Principle

SDUs DAISY: A Benchmark for Danish Culture

Stingy Context: 18:1 Hierarchical Code Compression for LLM Auto-Coding

The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models

Evaluating Large Language Models for Abstract Evaluation Tasks: An Empirical Study

OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling

Table-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation

HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue

Demystifying Multi-Agent Debate: The Role of Confidence and Diversity

FastWhisper: Adaptive Self-knowledge Distillation for Real-time Automatic Speech Recognition

Modeling Next-Token Prediction as Left-Nested Intuitionistic Implication

Simulating Complex Multi-Turn Tool Calling Interactions in Stateless Execution Environments

From Intuition to Expertise: Rubric-Based Cognitive Calibration for Human Detection of LLM-Generated Korean Text

Analysis of LLM Vulnerability to GPU Soft Errors: An Instruction-Level Fault Injection Study

GTAC: A Generative Transformer for Approximate Circuits

DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs

STELLAR: Structure-guided LLM Assertion Retrieval and Generation for Formal Verification

Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving

Created by

Haebom

Category

Empty

저자

Jingyun Wang, Dian Li, Xiaohan Wang, Gang Liu, Jiahong Yan, Guoliang Kang

💡 개요

본 연구는 대규모 언어 모델(LLM)이 시각적 정보 처리에 어려움을 겪는 평면 기하학 문제 해결(PGPS) 과제에 대해, 시각적 다이어그램을 간결하고 구조화된 텍스트 형식(CDL)으로 변환하는 MLLM Interpreter를 제안합니다. 이렇게 변환된 기하학적 설명을 기반으로 기존 LLM을 활용하여 문제 해결의 잠재력을 끌어내고자 합니다. 제안된 방법은 CoT(Chain-of-Thought)를 활용한 SFT와 CDL 매칭 보상을 이용한 GRPO 학습을 통해 MLLM Interpreter를 효과적으로 훈련시키며, 기존 방식보다 적은 데이터로도 우수한 성능을 달성합니다.

🔑 시사점 및 한계

•

기존 MLLM의 종단 간(end-to-end) 학습 방식이 LLM 고유의 추론 능력을 저해할 수 있다는 점을 지적하며, 시각 정보와 추론 과정을 분리하는 새로운 접근 방식을 제시합니다.

•

간결하고 구조화된 기하학적 설명 언어(CDL)를 도입하여 MLLM Interpreter의 학습 효율성을 높이고, LLM이 기하학적 추론을 수행하는 데 필요한 정보를 효과적으로 전달합니다.

•

CDL 매칭 보상을 활용한 GRPO 학습은 기존의 답 기반 보상보다 더 직접적이고 밀도 높은 지도 정보를 제공하여 CDL 생성 품질을 향상시킵니다.

•

제안된 MLLM Interpreter의 성능은 CDL의 표현력과 MLLM Interpreter의 정확도에 크게 의존하며, 복잡하거나 비정형적인 기하학적 표현에 대한 대응 능력이 추가적으로 연구될 필요가 있습니다.

Made with Slashpage