Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Semantic Content Determines Algorithmic Performance

Beyond Imitation: Reinforcement Learning for Active Latent Planning

Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves

Chain Of Thought Compression: A Theoritical Analysis

EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots

Meta Context Engineering via Agentic Skill Evolution

ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory

ARGORA: Orchestrated Argumentation for Causally Grounded LLM Reasoning and Decision Making

KAPSO: A Knowledge-grounded framework for Autonomous Program Synthesis and Optimization

LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI

The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation

MAR: Efficient Large Language Models via Module-aware Architecture Refinement

The Path of Least Resistance: Guiding LLM Reasining Trajectories with Prefix Consensus

ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

Topeax -- An Improved Clustering Topic Model with Density Peak Detection and Lexical-Semantic Term Importance

LION: A Clifford Neural Paradigm for Multimodal-Attributed Graph Learning

ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design

The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making

When Prohibitions Become Permissions: Auditing Negation Sensitivity in Language Models

System 1&2 Synergy via Dynamic Model Interpolation

DataCross: A Unified Benchmark and Agent Framework for Cross-Modal Heterogeneous Data Analysis

TeachBench: A Syllabus-Grounded Framework for Evaluating Teaching Ability in Large Language Models

NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents

Hebbian Learning with Global Direction

Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI Agents

Dynamic Framework for Collaborative Learning: Leveraging Advanced LLM with Adaptive Feedback Mechanisms

Ostrakon-VL: Towards Domain-Expert MLLM for Food-Service and Retail Stores

EHR-RAG: Bridging Long-Horizon Structured Electronic Health Records and Large Language Models via Enhanced Retrieval-Augmented Generation

Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks

Modeling Endogenous Logic: Causal Neuro-Symbolic Reasoning Model for Explainable Multi-Behavior Recommendation

White-Box Op-Amp Design via Human-Mimicking Reasoning

Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving

Position: Certifiable State Integrity in Cyber-Physical Systems -- Why Modular Sovereignty Solves the Plasticity-Stability Paradox

TIDE: Tuning-Integrated Dynamic Evolution for LLM-Based Automated Heuristic Design

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs

Delegation Without Living Governance

Causal Discovery for Explainable AI: A Dual-Encoding Approach

Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning

Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification

When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning

Do Reasoning Models Enhance Embedding Models?

Sycophantic Anchors: Localizing and Quantifying User Agreement in Reasoning Models

MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models

FrontierScience: Evaluating AI's Ability to Perform Expert-Level Scientific Tasks

Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving

Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning

BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding

What You Feel Is Not What They See: On Predicting Self-Reported Emotion from Third-Party Observer Labels

Beyond a Single Reference: Training and Evaluation with Paraphrases in Sign Language Translation

CUA-Skill: Develop Skills for Computer Using Agent

Planner-Auditor Twin: Agentic Discharge Planning with FHIR-Based LLM Planning, Guideline Recall, Optional Caching and Self-Improvement

How does information access affect LLM monitors' ability to detect sabotage?

Magellan: Autonomous Discovery of Novel Compiler Optimization Heuristics with AlphaEvolve

Responsible AI: The Good, The Bad, The AI

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Multi-modal Imputation for Alzheimer's Disease Classification

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report

QUARK: Robust Retrieval under Non-Faithful Queries via Query-Anchored Aggregation

Unplugging a Seemingly Sentient Machine Is the Rational Choice -- A Metaphysical Perspective

Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models

The Epistemic Planning Domain Definition Language: Official Guideline

Do LLMs Favor LLMs? Quantifying Interaction Effects in Peer Review

Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation

CiMRAG: Cim-Aware Domain-Adaptive and Noise-Resilient Retrieval-Augmented Generation for Edge-Based LLMs

Structural Compositional Function Networks: Interpretable Functional Compositions for Tabular Discovery

LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?

On the Effectiveness of LLM-Specific Fine-Tuning for Detecting AI-Generated Text

Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

Do we really need Self-Attention for Streaming Automatic Speech Recognition?

VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models

Probabilistic Sensing: Intelligence in Data Sampling

LTS-VoiceAgent: A Listen-Think-Speak Framework for Efficient Streaming Voice Interaction via Semantic Triggering and Incremental Reasoning

NCSAM Noise-Compensated Sharpness-Aware Minimization for Noisy Label Learning

Benchmarking von ASR-Modellen im deutschen medizinischen Kontext: Eine Leistungsanalyse anhand von Anamnesegespr\"achen

Bench4HLS: End-to-End Evaluation of LLMs in High-Level Synthesis Code Generation

Continuous-Flow Data-Rate-Aware CNN Inference on FPGA

DecHW: Heterogeneous Decentralized Federated Learning Exploiting Second-Order Information

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

Mem2ActBench: A Benchmark for Evaluating Long-Term Memory Utilization in Task-Oriented Autonomous Agents

Quantifying non deterministic drift in large language models

Text-to-State Mapping for Non-Resolution Reasoning: The Contradiction-Preservation Principle

SDUs DAISY: A Benchmark for Danish Culture

Stingy Context: 18:1 Hierarchical Code Compression for LLM Auto-Coding

The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models

Evaluating Large Language Models for Abstract Evaluation Tasks: An Empirical Study

OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling

Table-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation

HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue

Demystifying Multi-Agent Debate: The Role of Confidence and Diversity

FastWhisper: Adaptive Self-knowledge Distillation for Real-time Automatic Speech Recognition

Modeling Next-Token Prediction as Left-Nested Intuitionistic Implication

Simulating Complex Multi-Turn Tool Calling Interactions in Stateless Execution Environments

From Intuition to Expertise: Rubric-Based Cognitive Calibration for Human Detection of LLM-Generated Korean Text

Analysis of LLM Vulnerability to GPU Soft Errors: An Instruction-Level Fault Injection Study

GTAC: A Generative Transformer for Approximate Circuits

DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs

STELLAR: Structure-guided LLM Assertion Retrieval and Generation for Formal Verification

Magellan: Autonomous Discovery of Novel Compiler Optimization Heuristics with AlphaEvolve

Created by

Haebom

Category

Empty

저자

Hongzheng Chen, Alexander Novikov, Ngan Vu, Hanna Alam, Zhiru Zhang, Aiden Grossman, Mircea Trofin, Amir Yazdanbakhsh

💡 개요

본 논문은 현대 컴파일러 최적화에서 사용되는 수작업 휴리스틱의 한계를 극복하기 위해, LLM 코딩 에이전트와 진화적 탐색 및 자동 튜닝을 결합한 Magellan 프레임워크를 제안합니다. Magellan은 사용자 정의 매크로 벤치마크를 기반으로 C++ 실행 가능한 의사결정 로직을 생성, 평가, 개선하는 폐쇄 루프를 통해 컴파일러 패스 자체를 진화시키며, 이를 통해 기존 컴파일러에 직접 통합될 수 있는 간결한 휴리스틱을 합성합니다. 여러 실제 최적화 작업에서 Magellan은 전문가 기반 휴리스틱과 동등하거나 그 이상의 성능을 달성했습니다.

🔑 시사점 및 한계

•

자동화된 휴리스틱 발견: Magellan은 LLM과 진화 알고리즘을 결합하여 복잡한 소프트웨어 및 하드웨어 환경에 적응할 수 있는 새로운 컴파일러 최적화 휴리스틱을 자동으로 발견할 수 있음을 보여줍니다.

•

인간 엔지니어링 성능 초월: LLVM의 함수 인라이닝과 같이 수십 년간의 수동 엔지니어링으로 발전해 온 휴리스틱을 Magellan이 발견한 새로운 휴리스틱이 능가하는 결과를 보여주며, 자동화된 접근 방식의 잠재력을 입증합니다.

•

LLVM 외 다른 컴파일러 및 최적화 문제로의 확장 가능성: XLA 문제에 대한 예비 결과를 통해 LLVM을 넘어선 이식성과 엔지니어링 노력 감소를 시사하며, Magellan의 일반화 가능성을 보여줍니다.

•

평가 기준의 중요성 및 자동 튜닝의 복잡성: Magellan의 성능은 사용자 제공 매크로 벤치마크에 크게 의존하므로, 적절한 평가 기준 설정이 중요하며, 또한 진화 과정에서의 자동 튜닝은 계산적으로 복잡할 수 있습니다.

Made with Slashpage