Daily Arxiv

New

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 요약본 공유 시 출처만 명기하면 됩니다.
This service is supported by Google Gemini.

APEX-Accounting

MMAC: A Massive Multi-dimensional Benchmark for Audio Captioning

Visual Credit Audit for Multimodal Spatial Reasoning

Budget-Aware LLM Discovery via Cost-Calibrated Frontier Utility

Constitutional Midtraining: Content Presence Drives Alignment Gains

WhisperRec: Latent Reasoning for Efficient Foundation Recommendation Models

Reinforcement Learning on Cost-Constrained Quadrupedal Hardware

RIDGE: An Autonomous Framework for Validation and Method Discovery in LLM-Generated Option Pricing

REPREC: Representation Driven Parameter-Efficient Recommendation System

Reading Without a Reader: Large Language Models Collapse Reading and Writing into a Single Entangled Code

Rethinking Classifier-Free Guidance in On-Policy Diffusion Distillation

LU-500: A Logo Benchmark for Concept Unlearning

Improved lower bounds for the Shannon capacity of odd cycles

SLAI T-Rex: Full-Parameter Post-training of the DeepSeek-V4 Family on Ascend SuperPOD

PathAgentBench: Benchmarking Evidence-Seeking Vision-Language Models on Whole-Slide Pathology Image

Auditing Question-Order Effects in Large Language Models with the QQ Equality: Mechanism Characterization and a Saturation Caveat

When Does Muon Help Agentic Reinforcement Learning?

Geometric mean-based pairwise comparison method with the reference values -- statistical approach

Exposure is not manifestation: measurement target and output resolution jointly determine which behavioural-faithfulness evaluator wins

ACPO: Asymmetric Credit Policy Optimization via Mode-Local Entropy Surrogate

Explaining Data Mixing Scaling Laws

What If Prompt Injection Never Left? Rethinking Agent Security through Cross-Session Stored Prompt Injection

Averaged Evaluation Masks Capability Trade-Offs: Multi-Source Calibration for High-Sparsity LLM Pruning

Toward a More Ethical Facial Age Estimation: A Generalized Zero-Shot Benchmark Without Training on Children's Data

MinerU-Popo: Universal Post-Processing Model for Structured Document Parsing

SNAC-Pack 2.0: Scaled-Out Surrogate Neural Architecture Codesign

PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

NorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpus

Defusing the Trigger: Tail-Risk-Informed Attention Rebalancing for LLM Backdoor Mitigation

The Topological Trouble With Transformers

How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data

REFINE-DP: Diffusion Policy Fine-tuning for Humanoid Loco-manipulation via Reinforcement Learning

Energy-Driven Adaptive Visual Token Pruning for Efficient Vision-Language Models

RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design

Exact and Asymptotically Complete Robust Verifications of Neural Networks via Ising Solvers

Transporting Task Vectors across Different Architectures without Training

The Confidence Manifold: Geometric Structure of Correctness Representations in Language Models

Procedural Fairness in Multi-Agent Bandits

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

How Context Shapes Truth: Geometric Transformations of Statement-level Truth Representations in LLMs

Women Worry, Men Adopt? Gendered Risk Perceptions and Generative AI Adoption

Epistemic diversity across language models mitigates knowledge collapse

One Leak Away: How Pretrained Model Exposure Amplifies Jailbreak Risks in Finetuned LLMs

Functional Percolation: Criticality of Form and Function

Dynamically Scaled Activation Steering

DinoLizer: Separating VAE and Diffusion Artifacts in Generative Inpainting Localization

EmoFeedback$^2$: Reinforcement of Continuous Emotional Image Generation via LVLM-based Reward and Textual Feedback

Distributions In, Distributions Out: The Case for Soft-Label Training

MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding

Linking Heterogeneous Data with Coordinated Agent Flows for Social Media Analysis

Critical attention scaling in long-context transformers

On the Rate of Convergence of Kolmogorov-Arnold Network Regression Estimators

Improved Classification of Nitrogen Stress Severity in Plants Under Combined Stress Conditions Using Spatio-Temporal Deep Learning Framework

Enhancing Scene Transition Awareness in Video Generation via Post-Training

When Do AI Gains Become Broadly Shareable? A Policy Threshold for AI-Driven Automation

CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations

Hallucinations and Truth: A Comprehensive Accuracy Evaluation of RAG, LoRA and DoRA

Metareasoning constraints couple narratives, affect and cognition

LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints

Morphological Detection and Classification of Microplastics and Nanoplastics Emerged from Consumer Products by Deep Learning

ID-Guard: A Universal Framework for Combating Facial Manipulation via Breaking Identification

MSGNN: A Spectral Graph Neural Network Based on a Novel Magnetic Signed Laplacian

Desktop-Delta Bench: Do Computer-Use Models Understand Desktop GUI Transitions?

Nudging Sustainable Choices through LLM-Generated Recommendation Explanations

HANDBOOK.md: A Benchmark for Long-Context Agentic Instruction Following

SpecPrefetch: Parameter-Efficient Expert Prefetching for Sparse MoE Foundation Models

CachedSearch: Training-Free Cached Exploration for Test-Time Search in Video Diffusion

Agent Team Work Zone: An Automated, Persistent Workspace for Long-Lived Claude Code Agent Teams

What AI Red-Team Evaluations Can and Cannot Prove

Wavelet Phase Diffusion for Structurally and Semantically Consistent Sim-to-Real Translation

PATS: Policy-Aware Training Scaffolding for Agentic Reinforcement Learning

Is Deep Research Reliable? Misleading Knowledge Induces False Conclusions

RELIC: Revealed Principles for Learning Interpretable Composable Skills in Multi-Agent Planning

Networked Intelligence: Active Shared Context Graphs for Human-AI Team Science

Who Grades the Grader? Co-Evolving Evaluation Metrics and Skills for Self-Improving LLM Agents

Learning to Select, Not Relearn: Hard-Routed Mixtures of Reasoning LoRAs

ATOD: Annealed Turn-Aware On-Policy Distillation for Multi-Turn Agentic Tasks

A Matter of Time: Towards a General Theory of Agency

Attractor Domain Theory: A Mathematical Framework for Cardiovascular Attractor Analysis with Wearable Photoplethysmography (PPG) Validation

DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

A Policy-Driven Runtime Layer for Agentic LLM Serving

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

Orchard: An Open-Source Agentic Modeling Framework

From Large Language Model Predicates to Logic Tensor Networks: Neurosymbolic Offer Validation in Regulated Procurement

Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

RetiBridge: Bridging Quantitative Retinal Biomarkers and Qualitative Diagnosis with a Knowledge-Guided Multimodal Large Language Model

A Tale of LLMs and Induced Small Proxies: Scalable Small Language Models for Knowledge Mining

Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline

Decision-oriented joint optimization of evidence fusion based on event-conditioned credibility

MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models

A Review on Building Blocks of Decentralized Artificial Intelligence

Representation and Invariance in Reinforcement Learning

Learning to Trace Seiberg Dualities

ReToken: One Token to Improve Vision-Language Models for Visual Retrieval

PAC-MAN: Perception-Aware CBF-RL for Whole-Body Safety in Humanoid Dodgeball

AskChem: Claim-Centered Infrastructure for Chemistry Literature Synthesis

PAIChecker: Uncovering and Checking PR-Issue Misalignment in SWE-Bench-Like Benchmarks

Sample More, Reflect Less: Self-Refine and Reflexion Lose to Repeated Sampling at Equal Token Cost, from 1.5B to 7B

Algorithms for Structured Elections under Thiele Voting Rules

APO: Unsupervised Atomic Policy Optimization for 3D Structure Prediction of Atomic Systems

Made with Slashpage