Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model

The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover

Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning

HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning

Ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications

PWD: Prior-Guided and Wavelet-Enhanced Diffusion Model for Limited-Angle CT

VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting

Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision

Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition

MCFormer: A Multi-Cost-Volume Network and Comprehensive Benchmark for Particle Image Velocimetry

Toward Efficient Speech Emotion Recognition via Spectral Learning and Attention

Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

Solving the Hubbard model with Neural Quantum States

S2FGL: Spatial Spectral Federated Graph Learning

Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging

Description of the Training Process of Neural Networks via Ergodic Theorem: Ghost nodes

A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search

Damba-ST: Domain-Adaptive Mamba for Efficient Urban Spatio-Temporal Prediction

Studying and Improving Graph Neural Network-based Motif Estimation

Learning Algorithms in the Limit

Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models

HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations

MAEBE: Multi-Agent Emergent Behavior Framework

Evaluating LLM Agent Adherence to Hierarchical Safety Principles: A Lightweight Benchmark for Probing Foundational Controllability Components

What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training

From Images to Signals: Are Large Vision Models Useful for Time Series Analysis?

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Anchoring AI Capabilities in Market Valuations: The Capability Realization Rate Model and Valuation Misalignment Risk

Fair Uncertainty Quantification for Depression Prediction

MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework

A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning

Constraint Programming Models For Serial Batch Scheduling With Minimum Batch Size

Toward Holistic Evaluation of Recommender Systems Powered by Generative Models

Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation

Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation

Decoding AI Judgment: How LLMs Assess News Credibility and Bias

Ethical Concerns of Generative AI and Mitigation Strategies: A Systematic Mapping Study

Diffusion Augmented Retrieval: A Training-Free Approach to Interactive Text-to-Image Retrieval

Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process

Cosmos World Foundation Model Platform for Physical AI

Enhancing Transformers for Generalizable First-Order Logical Entailment

Multi-Scenario Reasoning: Unlocking Cognitive Autonomy in Humanoid Robots for Multimodal Understanding

DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness

Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge

Understanding Chain-of-Thought in LLMs through Information Theory

A Multi-Granularity Supervised Contrastive Framework for Remaining Useful Life Prediction of Aero-engines

MarineFormer: A Spatio-Temporal Attention Model for USV Navigation in Dynamic Marine Environments

HARMONIC: Cognitive and Control Collaboration in Human-Robotic Teams

Investigating Context-Faithfulness in Large Language Models: The Roles of Memory Strength and Evidence Style

Masked Image Modeling: A Survey

Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences

Curriculum Negative Mining For Temporal Networks

C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition

Multi-Head RAG: Solving Multi-Aspect Problems with LLMs

Solving Probabilistic Verification Problems of Neural Networks using Branch and Bound

Offline Trajectory Optimization for Offline Reinforcement Learning

Structure Guided Large Language Model for SQL Generation

A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive

Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning

Unsupervised Automata Learning via Discrete Optimization

Don't Get Me Wrong: How to Apply Deep Visual Interpretations to Time Series

An Algorithm for Learning Smaller Representations of Models With Scarce Data

GTA1: GUI Test-time Scaling Agent

Fuzzy Classification Aggregation for a Continuum of Agents

Rule Learning for Knowledge Graph Reasoning under Agnostic Distribution Shift

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

AI's Euclid's Elements Moment: From Language Models to Computable Thought

Closer to Language than Steam: AI as the Cognitive Engine of a New Productivity Revolution

Access Controls Will Solve the Dual-Use Dilemma

Task Assignment and Exploration Optimization for Low Altitude UAV Rescue via Generative AI Enhanced Multi-agent Reinforcement Learning

Affordable AI Assistants with Knowledge Graph of Thoughts

Deontic Temporal Logic for Formal Verification of AI Ethics

Multi-Agent Pathfinding Under Team-Connected Communication Constraint via Adaptive Path Expansion and Dynamic Leading

Constrain Alignment with Sparse Autoencoders

Multi-modal Generative AI: Multi-modal LLMs, Diffusions and the Unification

SimSUM: Simulated Benchmark with Structured and Unstructured Medical Records

Solving a Stackelberg Game on Transportation Networks in a Dynamic Crime Scenario: A Mixed Approach on Multi-Layer Networks

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

PyVision: Agentic Vision with Dynamic Tooling

Single-pass Adaptive Image Tokenization for Minimum Program Search

Multigranular Evaluation for Brain Visual Decoding

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

EXPO: Stable Reinforcement Learning with Expressive Policies

Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Why is Your Language Model a Poor Implicit Reward Model?

Reinforcement Learning with Action Chunking

Scaling RL to Long Videos

MIRIX: Multi-Agent Memory System for LLM-Based Agents

Low Resource Reconstruction Attacks Through Benign Prompts

Probing Experts' Perspectives on AI-Assisted Public Speaking Training

Towards Continuous Home Cage Monitoring: An Evaluation of Tracking and Identification Strategies for Laboratory Mice

DTECT: Dynamic Topic Explorer & Context Tracker

Agentic Retrieval of Topics and Insights from Earnings Calls

A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search

Created by

Haebom

Author

Austin R. Ellis-Mohr, Anuj K. Nayak, Lav R. Varshney

Outline

In this paper, we point out that the inference cost of large-scale language models (LLMs) is increasing, and propose a novel framework, directed stochastic skill search (DS3), that represents the inference process as a probabilistic search over a learned skill graph. DS3 provides analytical formulas for calculating the task success rate and computational cost for various inference strategies, such as chains of thought (CoT) and tree of thought (ToT), enabling comparative analysis with respect to task difficulty and model performance. By extending the ternary graph framework for LLM training to integrate inference, and connecting DS3 with experimental methods that characterize LLM scaling behavior, we theoretically reproduce experimentally observed patterns, such as linear accuracy scaling with logarithmic computational cost, variation of optimal inference strategies with task difficulty and model performance, emergent behavior exhibited by inference even when performance plateaus under parameter scaling, and best-of-N (BoN) and majority voting behaviors captured within the integrated analysis framework. By explicitly characterizing training-inference interdependencies, this framework deepens theoretical understanding and supports principled algorithm design and resource allocation.

Takeaways, Limitations

•

Takeaways:

◦

A new theoretical framework (DS3) for efficient management of LLM inference costs is presented.

◦

Analytical prediction of optimal inference strategies based on task difficulty and model performance.

◦

Provides a deeper understanding of the interdependence of training and inference.

◦

Contribute to principle-based algorithm design and resource allocation strategy development.

◦

Theoretical explanation of experimentally observed LLM expansion behavior patterns.

•

Limitations:

◦

Absence of application and performance evaluation results for real LLM of the DS3 framework.

◦

The generalizability of the proposed analytical expression to the complexity of real LLMs needs to be verified.

◦

Need to check generality for different types of LLMs and jobs.

◦

Lack of specific guidelines for resource allocation strategies in real-world environments.

View PDF

Made with Slashpage