haebom
Daily Arxiv
์ ์ธ๊ณ์์ ๋ฐ๊ฐ๋๋ ์ธ๊ณต์ง๋ฅ ๊ด๋ จ ๋ ผ๋ฌธ์ ์ ๋ฆฌํ๋ ํ์ด์ง ์ ๋๋ค.
๋ณธ ํ์ด์ง๋ Google Gemini๋ฅผ ํ์ฉํด ์์ฝ ์ ๋ฆฌํ๋ฉฐ, ๋น์๋ฆฌ๋ก ์ด์ ๋ฉ๋๋ค.
๋ ผ๋ฌธ์ ๋ํ ์ ์๊ถ์ ์ ์ ๋ฐ ํด๋น ๊ธฐ๊ด์ ์์ผ๋ฉฐ, ๊ณต์ ์ ์ถ์ฒ๋ง ๋ช ๊ธฐํ๋ฉด ๋ฉ๋๋ค.
Semantic Content Determines Algorithmic Performance
Beyond Imitation: Reinforcement Learning for Active Latent Planning
Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves
Chain Of Thought Compression: A Theoritical Analysis
EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots
Meta Context Engineering via Agentic Skill Evolution
ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory
ARGORA: Orchestrated Argumentation for Causally Grounded LLM Reasoning and Decision Making
KAPSO: A Knowledge-grounded framework for Autonomous Program Synthesis and Optimization
LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI
The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation
MAR: Efficient Large Language Models via Module-aware Architecture Refinement
The Path of Least Resistance: Guiding LLM Reasining Trajectories with Prefix Consensus
ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
Topeax -- An Improved Clustering Topic Model with Density Peak Detection and Lexical-Semantic Term Importance
LION: A Clifford Neural Paradigm for Multimodal-Attributed Graph Learning
ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design
The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making
When Prohibitions Become Permissions: Auditing Negation Sensitivity in Language Models
System 1&2 Synergy via Dynamic Model Interpolation
DataCross: A Unified Benchmark and Agent Framework for Cross-Modal Heterogeneous Data Analysis
TeachBench: A Syllabus-Grounded Framework for Evaluating Teaching Ability in Large Language Models
NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents
Hebbian Learning with Global Direction
Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization
BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI Agents
Dynamic Framework for Collaborative Learning: Leveraging Advanced LLM with Adaptive Feedback Mechanisms
Ostrakon-VL: Towards Domain-Expert MLLM for Food-Service and Retail Stores
EHR-RAG: Bridging Long-Horizon Structured Electronic Health Records and Large Language Models via Enhanced Retrieval-Augmented Generation
Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks
Modeling Endogenous Logic: Causal Neuro-Symbolic Reasoning Model for Explainable Multi-Behavior Recommendation
White-Box Op-Amp Design via Human-Mimicking Reasoning
Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving
Position: Certifiable State Integrity in Cyber-Physical Systems -- Why Modular Sovereignty Solves the Plasticity-Stability Paradox
TIDE: Tuning-Integrated Dynamic Evolution for LLM-Based Automated Heuristic Design
Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs
Delegation Without Living Governance
Causal Discovery for Explainable AI: A Dual-Encoding Approach
Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning
Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification
When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning
Do Reasoning Models Enhance Embedding Models?
Sycophantic Anchors: Localizing and Quantifying User Agreement in Reasoning Models
MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models
FrontierScience: Evaluating AI's Ability to Perform Expert-Level Scientific Tasks
Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving
Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning
BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding
What You Feel Is Not What They See: On Predicting Self-Reported Emotion from Third-Party Observer Labels
Beyond a Single Reference: Training and Evaluation with Paraphrases in Sign Language Translation
CUA-Skill: Develop Skills for Computer Using Agent
Planner-Auditor Twin: Agentic Discharge Planning with FHIR-Based LLM Planning, Guideline Recall, Optional Caching and Self-Improvement
How does information access affect LLM monitors' ability to detect sabotage?
Magellan: Autonomous Discovery of Novel Compiler Optimization Heuristics with AlphaEvolve
Responsible AI: The Good, The Bad, The AI
OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence
Multi-modal Imputation for Alzheimer's Disease Classification
Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report
QUARK: Robust Retrieval under Non-Faithful Queries via Query-Anchored Aggregation
Unplugging a Seemingly Sentient Machine Is the Rational Choice -- A Metaphysical Perspective
Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models
The Epistemic Planning Domain Definition Language: Official Guideline
Do LLMs Favor LLMs? Quantifying Interaction Effects in Peer Review
Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation
CiMRAG: Cim-Aware Domain-Adaptive and Noise-Resilient Retrieval-Augmented Generation for Edge-Based LLMs
Structural Compositional Function Networks: Interpretable Functional Compositions for Tabular Discovery
LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
On the Effectiveness of LLM-Specific Fine-Tuning for Detecting AI-Generated Text
Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers
MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference
Do we really need Self-Attention for Streaming Automatic Speech Recognition?
VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models
Probabilistic Sensing: Intelligence in Data Sampling
LTS-VoiceAgent: A Listen-Think-Speak Framework for Efficient Streaming Voice Interaction via Semantic Triggering and Incremental Reasoning
NCSAM Noise-Compensated Sharpness-Aware Minimization for Noisy Label Learning
Benchmarking von ASR-Modellen im deutschen medizinischen Kontext: Eine Leistungsanalyse anhand von Anamnesegespr\"achen
Bench4HLS: End-to-End Evaluation of LLMs in High-Level Synthesis Code Generation
Continuous-Flow Data-Rate-Aware CNN Inference on FPGA
DecHW: Heterogeneous Decentralized Federated Learning Exploiting Second-Order Information
Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data
Mem2ActBench: A Benchmark for Evaluating Long-Term Memory Utilization in Task-Oriented Autonomous Agents
Quantifying non deterministic drift in large language models
Text-to-State Mapping for Non-Resolution Reasoning: The Contradiction-Preservation Principle
SDUs DAISY: A Benchmark for Danish Culture
Stingy Context: 18:1 Hierarchical Code Compression for LLM Auto-Coding
The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models
Evaluating Large Language Models for Abstract Evaluation Tasks: An Empirical Study
OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling
Table-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation
HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue
Demystifying Multi-Agent Debate: The Role of Confidence and Diversity
FastWhisper: Adaptive Self-knowledge Distillation for Real-time Automatic Speech Recognition
Modeling Next-Token Prediction as Left-Nested Intuitionistic Implication
Simulating Complex Multi-Turn Tool Calling Interactions in Stateless Execution Environments
From Intuition to Expertise: Rubric-Based Cognitive Calibration for Human Detection of LLM-Generated Korean Text
Analysis of LLM Vulnerability to GPU Soft Errors: An Instruction-Level Fault Injection Study
GTAC: A Generative Transformer for Approximate Circuits
DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs
STELLAR: Structure-guided LLM Assertion Retrieval and Generation for Formal Verification
Load more
Magellan: Autonomous Discovery of Novel Compiler Optimization Heuristics with AlphaEvolve
Created by
Haebom
Category
Empty
์ ์
Hongzheng Chen, Alexander Novikov, Ng
an V
u, Hanna Alam, Zhiru Zhang, Aiden Grossman, Mircea Trofin, Amir Yazdanbakhsh
๐ก ๊ฐ์
๋ณธ ๋ ผ๋ฌธ์ ํ๋ ์ปดํ์ผ๋ฌ ์ต์ ํ์์ ์ฌ์ฉ๋๋ ์์์ ํด๋ฆฌ์คํฑ์ ํ๊ณ๋ฅผ ๊ทน๋ณตํ๊ธฐ ์ํด, LLM ์ฝ๋ฉ ์์ด์ ํธ์ ์งํ์ ํ์ ๋ฐ ์๋ ํ๋์ ๊ฒฐํฉํ Magellan ํ๋ ์์ํฌ๋ฅผ ์ ์ํฉ๋๋ค. Magellan์ ์ฌ์ฉ์ ์ ์ ๋งคํฌ๋ก ๋ฒค์น๋งํฌ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก C++ ์คํ ๊ฐ๋ฅํ ์์ฌ๊ฒฐ์ ๋ก์ง์ ์์ฑ, ํ๊ฐ, ๊ฐ์ ํ๋ ํ์ ๋ฃจํ๋ฅผ ํตํด ์ปดํ์ผ๋ฌ ํจ์ค ์์ฒด๋ฅผ ์งํ์ํค๋ฉฐ, ์ด๋ฅผ ํตํด ๊ธฐ์กด ์ปดํ์ผ๋ฌ์ ์ง์ ํตํฉ๋ ์ ์๋ ๊ฐ๊ฒฐํ ํด๋ฆฌ์คํฑ์ ํฉ์ฑํฉ๋๋ค. ์ฌ๋ฌ ์ค์ ์ต์ ํ ์์ ์์ Magellan์ ์ ๋ฌธ๊ฐ ๊ธฐ๋ฐ ํด๋ฆฌ์คํฑ๊ณผ ๋๋ฑํ๊ฑฐ๋ ๊ทธ ์ด์์ ์ฑ๋ฅ์ ๋ฌ์ฑํ์ต๋๋ค.
๐ ์์ฌ์ ๋ฐ ํ๊ณ
โข
์๋ํ๋ ํด๋ฆฌ์คํฑ ๋ฐ๊ฒฌ:
Magellan์ LLM๊ณผ ์งํ ์๊ณ ๋ฆฌ์ฆ์ ๊ฒฐํฉํ์ฌ ๋ณต์กํ ์ํํธ์จ์ด ๋ฐ ํ๋์จ์ด ํ๊ฒฝ์ ์ ์ํ ์ ์๋ ์๋ก์ด ์ปดํ์ผ๋ฌ ์ต์ ํ ํด๋ฆฌ์คํฑ์ ์๋์ผ๋ก ๋ฐ๊ฒฌํ ์ ์์์ ๋ณด์ฌ์ค๋๋ค.
โข
์ธ๊ฐ ์์ง๋์ด๋ง ์ฑ๋ฅ ์ด์:
LLVM์ ํจ์ ์ธ๋ผ์ด๋๊ณผ ๊ฐ์ด ์์ญ ๋ ๊ฐ์ ์๋ ์์ง๋์ด๋ง์ผ๋ก ๋ฐ์ ํด ์จ ํด๋ฆฌ์คํฑ์ Magellan์ด ๋ฐ๊ฒฌํ ์๋ก์ด ํด๋ฆฌ์คํฑ์ด ๋ฅ๊ฐํ๋ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ฌ์ฃผ๋ฉฐ, ์๋ํ๋ ์ ๊ทผ ๋ฐฉ์์ ์ ์ฌ๋ ฅ์ ์ ์ฆํฉ๋๋ค.
โข
LLVM ์ธ ๋ค๋ฅธ ์ปดํ์ผ๋ฌ ๋ฐ ์ต์ ํ ๋ฌธ์ ๋ก์ ํ์ฅ ๊ฐ๋ฅ์ฑ:
XLA ๋ฌธ์ ์ ๋ํ ์๋น ๊ฒฐ๊ณผ๋ฅผ ํตํด LLVM์ ๋์ด์ ์ด์์ฑ๊ณผ ์์ง๋์ด๋ง ๋ ธ๋ ฅ ๊ฐ์๋ฅผ ์์ฌํ๋ฉฐ, Magellan์ ์ผ๋ฐํ ๊ฐ๋ฅ์ฑ์ ๋ณด์ฌ์ค๋๋ค.
โข
ํ๊ฐ ๊ธฐ์ค์ ์ค์์ฑ ๋ฐ ์๋ ํ๋์ ๋ณต์ก์ฑ:
Magellan์ ์ฑ๋ฅ์ ์ฌ์ฉ์ ์ ๊ณต ๋งคํฌ๋ก ๋ฒค์น๋งํฌ์ ํฌ๊ฒ ์์กดํ๋ฏ๋ก, ์ ์ ํ ํ๊ฐ ๊ธฐ์ค ์ค์ ์ด ์ค์ํ๋ฉฐ, ๋ํ ์งํ ๊ณผ์ ์์์ ์๋ ํ๋์ ๊ณ์ฐ์ ์ผ๋ก ๋ณต์กํ ์ ์์ต๋๋ค.
PDF ๋ณด๊ธฐ
Made with Slashpage