haebom
Daily Arxiv
์ ์ธ๊ณ์์ ๋ฐ๊ฐ๋๋ ์ธ๊ณต์ง๋ฅ ๊ด๋ จ ๋ ผ๋ฌธ์ ์ ๋ฆฌํ๋ ํ์ด์ง ์ ๋๋ค.
๋ณธ ํ์ด์ง๋ Google Gemini๋ฅผ ํ์ฉํด ์์ฝ ์ ๋ฆฌํ๋ฉฐ, ๋น์๋ฆฌ๋ก ์ด์ ๋ฉ๋๋ค.
๋ ผ๋ฌธ์ ๋ํ ์ ์๊ถ์ ์ ์ ๋ฐ ํด๋น ๊ธฐ๊ด์ ์์ผ๋ฉฐ, ๊ณต์ ์ ์ถ์ฒ๋ง ๋ช ๊ธฐํ๋ฉด ๋ฉ๋๋ค.
Semantic Content Determines Algorithmic Performance
Beyond Imitation: Reinforcement Learning for Active Latent Planning
Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves
Chain Of Thought Compression: A Theoritical Analysis
EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots
Meta Context Engineering via Agentic Skill Evolution
ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory
ARGORA: Orchestrated Argumentation for Causally Grounded LLM Reasoning and Decision Making
KAPSO: A Knowledge-grounded framework for Autonomous Program Synthesis and Optimization
LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI
The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation
MAR: Efficient Large Language Models via Module-aware Architecture Refinement
The Path of Least Resistance: Guiding LLM Reasining Trajectories with Prefix Consensus
ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
Topeax -- An Improved Clustering Topic Model with Density Peak Detection and Lexical-Semantic Term Importance
LION: A Clifford Neural Paradigm for Multimodal-Attributed Graph Learning
ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design
The Paradox of Robustness: Decoupling Rule-Based Logic from Affective Noise in High-Stakes Decision-Making
When Prohibitions Become Permissions: Auditing Negation Sensitivity in Language Models
System 1&2 Synergy via Dynamic Model Interpolation
DataCross: A Unified Benchmark and Agent Framework for Cross-Modal Heterogeneous Data Analysis
TeachBench: A Syllabus-Grounded Framework for Evaluating Teaching Ability in Large Language Models
NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents
Hebbian Learning with Global Direction
Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization
BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI Agents
Dynamic Framework for Collaborative Learning: Leveraging Advanced LLM with Adaptive Feedback Mechanisms
Ostrakon-VL: Towards Domain-Expert MLLM for Food-Service and Retail Stores
EHR-RAG: Bridging Long-Horizon Structured Electronic Health Records and Large Language Models via Enhanced Retrieval-Augmented Generation
Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks
Modeling Endogenous Logic: Causal Neuro-Symbolic Reasoning Model for Explainable Multi-Behavior Recommendation
White-Box Op-Amp Design via Human-Mimicking Reasoning
Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving
Position: Certifiable State Integrity in Cyber-Physical Systems -- Why Modular Sovereignty Solves the Plasticity-Stability Paradox
TIDE: Tuning-Integrated Dynamic Evolution for LLM-Based Automated Heuristic Design
Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs
Delegation Without Living Governance
Causal Discovery for Explainable AI: A Dual-Encoding Approach
Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning
Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification
When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning
Do Reasoning Models Enhance Embedding Models?
Sycophantic Anchors: Localizing and Quantifying User Agreement in Reasoning Models
MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models
FrontierScience: Evaluating AI's Ability to Perform Expert-Level Scientific Tasks
Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving
Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning
BrainStack: Neuro-MoE with Functionally Guided Expert Routing for EEG-Based Language Decoding
What You Feel Is Not What They See: On Predicting Self-Reported Emotion from Third-Party Observer Labels
Beyond a Single Reference: Training and Evaluation with Paraphrases in Sign Language Translation
CUA-Skill: Develop Skills for Computer Using Agent
Planner-Auditor Twin: Agentic Discharge Planning with FHIR-Based LLM Planning, Guideline Recall, Optional Caching and Self-Improvement
How does information access affect LLM monitors' ability to detect sabotage?
Magellan: Autonomous Discovery of Novel Compiler Optimization Heuristics with AlphaEvolve
Responsible AI: The Good, The Bad, The AI
OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence
Multi-modal Imputation for Alzheimer's Disease Classification
Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report
QUARK: Robust Retrieval under Non-Faithful Queries via Query-Anchored Aggregation
Unplugging a Seemingly Sentient Machine Is the Rational Choice -- A Metaphysical Perspective
Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models
The Epistemic Planning Domain Definition Language: Official Guideline
Do LLMs Favor LLMs? Quantifying Interaction Effects in Peer Review
Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation
CiMRAG: Cim-Aware Domain-Adaptive and Noise-Resilient Retrieval-Augmented Generation for Edge-Based LLMs
Structural Compositional Function Networks: Interpretable Functional Compositions for Tabular Discovery
LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
On the Effectiveness of LLM-Specific Fine-Tuning for Detecting AI-Generated Text
Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers
MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference
Do we really need Self-Attention for Streaming Automatic Speech Recognition?
VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models
Probabilistic Sensing: Intelligence in Data Sampling
LTS-VoiceAgent: A Listen-Think-Speak Framework for Efficient Streaming Voice Interaction via Semantic Triggering and Incremental Reasoning
NCSAM Noise-Compensated Sharpness-Aware Minimization for Noisy Label Learning
Benchmarking von ASR-Modellen im deutschen medizinischen Kontext: Eine Leistungsanalyse anhand von Anamnesegespr\"achen
Bench4HLS: End-to-End Evaluation of LLMs in High-Level Synthesis Code Generation
Continuous-Flow Data-Rate-Aware CNN Inference on FPGA
DecHW: Heterogeneous Decentralized Federated Learning Exploiting Second-Order Information
Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data
Mem2ActBench: A Benchmark for Evaluating Long-Term Memory Utilization in Task-Oriented Autonomous Agents
Quantifying non deterministic drift in large language models
Text-to-State Mapping for Non-Resolution Reasoning: The Contradiction-Preservation Principle
SDUs DAISY: A Benchmark for Danish Culture
Stingy Context: 18:1 Hierarchical Code Compression for LLM Auto-Coding
The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models
Evaluating Large Language Models for Abstract Evaluation Tasks: An Empirical Study
OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling
Table-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation
HEART: A Unified Benchmark for Assessing Humans and LLMs in Emotional Support Dialogue
Demystifying Multi-Agent Debate: The Role of Confidence and Diversity
FastWhisper: Adaptive Self-knowledge Distillation for Real-time Automatic Speech Recognition
Modeling Next-Token Prediction as Left-Nested Intuitionistic Implication
Simulating Complex Multi-Turn Tool Calling Interactions in Stateless Execution Environments
From Intuition to Expertise: Rubric-Based Cognitive Calibration for Human Detection of LLM-Generated Korean Text
Analysis of LLM Vulnerability to GPU Soft Errors: An Instruction-Level Fault Injection Study
GTAC: A Generative Transformer for Approximate Circuits
DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs
STELLAR: Structure-guided LLM Assertion Retrieval and Generation for Formal Verification
Load more
TIDE: Tuning-Integrated Dynamic Evolution for LLM-Based Automated Heuristic Design
Created by
Haebom
Category
Empty
์ ์
Chentong Chen, Mengyuan Zhong, Ye Fan, Jialong Shi, Jianyong Sun
๐ก ๊ฐ์
๋ณธ ๋ ผ๋ฌธ์ ๋๊ท๋ชจ ์ธ์ด ๋ชจ๋ธ(LLM) ๊ธฐ๋ฐ ์๋ ํด๋ฆฌ์คํฑ ์ค๊ณ์์ ๋ฐ์ํ๋ ์๊ณ ๋ฆฌ์ฆ ๊ตฌ์กฐ์ ์ฐ์ ๋งค๊ฐ๋ณ์ ๊ฐ์ ๋ถ๋ฆฌ ๋ถ์กฑ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด TIDE(Tuning-Integrated Dynamic Evolution) ํ๋ ์์ํฌ๋ฅผ ์ ์ํฉ๋๋ค. TIDE๋ ํธ๋ฆฌ ์ ์ฌ๋ ํธ์ง ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ์ ๊ตฌ์กฐ์ ๋ค์์ฑ ํ์๊ณผ LLM ๊ธฐ๋ฐ ๋ ผ๋ฆฌ ์์ฑ ๋ฐ ์ฐจ๋ถ ๋์ฐ๋ณ์ด ์ฐ์ฐ์๋ฅผ ํตํ ๋งค๊ฐ๋ณ์ ์ต์ ํ๋ฅผ ๊ฒฐํฉํ ์ค์ฒฉ ์ํคํ ์ฒ๋ฅผ ํน์ง์ผ๋ก ํฉ๋๋ค. ์ด๋ฅผ ํตํด ๋ค์ํ ์กฐํฉ ์ต์ ํ ๋ฌธ์ ์์ ๊ธฐ์กด ๋ฐฉ๋ฒ๋ก ๋ณด๋ค ๋ฐ์ด๋ ํด์ ํ์ง๊ณผ ํฅ์๋ ํ์ ํจ์จ์ฑ์ ๋ฌ์ฑํ์ต๋๋ค.
๐ ์์ฌ์ ๋ฐ ํ๊ณ
โข
LLM ๊ธฐ๋ฐ ํด๋ฆฌ์คํฑ ์ค๊ณ์์ ์๊ณ ๋ฆฌ์ฆ์ ๊ตฌ์กฐ์ ํน์ฑ๊ณผ ๋งค๊ฐ๋ณ์ ์ต์ ํ๋ฅผ ๋ช ํํ ๋ถ๋ฆฌํ์ฌ ํ์์ ํจ์จ์ฑ๊ณผ ๊ฒฐ๊ณผ์ ์ง์ ๋ชจ๋ ๋์ผ ์ ์์์ ๋ณด์ฌ์ค๋๋ค.
โข
UCB ๊ธฐ๋ฐ ์ค์ผ์ค๋ฌ๋ฅผ ํตํด ํ๋กฌํํธ ์ ๋ต์ ๋์ ์ผ๋ก ์ฐ์ ์์ํํจ์ผ๋ก์จ ์์ ํ ๋น์ ์ต์ ํํ๋ ์๋ก์ด ์ ๊ทผ ๋ฐฉ์์ ์ ์ํฉ๋๋ค.
โข
์ ์๋ TIDE ํ๋ ์์ํฌ๋ ํนํ ๋ณต์กํ ์กฐํฉ ์ต์ ํ ๋ฌธ์ ์์ ๊ธฐ์กด ์ต์ฒจ๋จ ๊ธฐ๋ฒ๋ค์ ๋ฅ๊ฐํ๋ ์ฑ๊ณผ๋ฅผ ๋ณด์์ผ๋ฉฐ, ์ด๋ ํฅํ ์๋ ์๊ณ ๋ฆฌ์ฆ ์ค๊ณ ๋ถ์ผ์ ์ค์ํ ์ํฅ์ ๋ฏธ์น ๊ฒ์ผ๋ก ๊ธฐ๋๋ฉ๋๋ค.
โข
๋ค์ํ ๋ฌธ์ ์์ญ์ ๋ํ ์ถ๊ฐ์ ์ธ ์คํ๊ณผ LLM์ 'ํ๊ฐ' ํ์์ผ๋ก ์ธํ ์ ์ฌ์ ์ธ ์ค๋ฅ ํ์ ๊ฐ๋ฅ์ฑ์ ๋ํ ํ๊ตฌ๊ฐ ํ์ํฉ๋๋ค.
PDF ๋ณด๊ธฐ
Made with Slashpage