Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model

The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover

Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning

HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning

Ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications

PWD: Prior-Guided and Wavelet-Enhanced Diffusion Model for Limited-Angle CT

VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting

Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision

Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition

MCFormer: A Multi-Cost-Volume Network and Comprehensive Benchmark for Particle Image Velocimetry

Toward Efficient Speech Emotion Recognition via Spectral Learning and Attention

Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

Solving the Hubbard model with Neural Quantum States

S2FGL: Spatial Spectral Federated Graph Learning

Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging

Description of the Training Process of Neural Networks via Ergodic Theorem: Ghost nodes

A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search

Damba-ST: Domain-Adaptive Mamba for Efficient Urban Spatio-Temporal Prediction

Studying and Improving Graph Neural Network-based Motif Estimation

Learning Algorithms in the Limit

Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models

HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations

MAEBE: Multi-Agent Emergent Behavior Framework

Evaluating LLM Agent Adherence to Hierarchical Safety Principles: A Lightweight Benchmark for Probing Foundational Controllability Components

What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training

From Images to Signals: Are Large Vision Models Useful for Time Series Analysis?

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Anchoring AI Capabilities in Market Valuations: The Capability Realization Rate Model and Valuation Misalignment Risk

Fair Uncertainty Quantification for Depression Prediction

MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework

A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning

Constraint Programming Models For Serial Batch Scheduling With Minimum Batch Size

Toward Holistic Evaluation of Recommender Systems Powered by Generative Models

Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation

Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation

Decoding AI Judgment: How LLMs Assess News Credibility and Bias

Ethical Concerns of Generative AI and Mitigation Strategies: A Systematic Mapping Study

Diffusion Augmented Retrieval: A Training-Free Approach to Interactive Text-to-Image Retrieval

Derivation of Output Correlation Inferences for Multi-Output (aka Multi-Task) Gaussian Process

Cosmos World Foundation Model Platform for Physical AI

Enhancing Transformers for Generalizable First-Order Logical Entailment

Multi-Scenario Reasoning: Unlocking Cognitive Autonomy in Humanoid Robots for Multimodal Understanding

DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness

Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge

Understanding Chain-of-Thought in LLMs through Information Theory

A Multi-Granularity Supervised Contrastive Framework for Remaining Useful Life Prediction of Aero-engines

MarineFormer: A Spatio-Temporal Attention Model for USV Navigation in Dynamic Marine Environments

HARMONIC: Cognitive and Control Collaboration in Human-Robotic Teams

Investigating Context-Faithfulness in Large Language Models: The Roles of Memory Strength and Evidence Style

Masked Image Modeling: A Survey

Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences

Curriculum Negative Mining For Temporal Networks

C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition

Multi-Head RAG: Solving Multi-Aspect Problems with LLMs

Solving Probabilistic Verification Problems of Neural Networks using Branch and Bound

Offline Trajectory Optimization for Offline Reinforcement Learning

Structure Guided Large Language Model for SQL Generation

A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive

Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning

Unsupervised Automata Learning via Discrete Optimization

Don't Get Me Wrong: How to Apply Deep Visual Interpretations to Time Series

An Algorithm for Learning Smaller Representations of Models With Scarce Data

GTA1: GUI Test-time Scaling Agent

Fuzzy Classification Aggregation for a Continuum of Agents

Rule Learning for Knowledge Graph Reasoning under Agnostic Distribution Shift

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

AI's Euclid's Elements Moment: From Language Models to Computable Thought

Closer to Language than Steam: AI as the Cognitive Engine of a New Productivity Revolution

Access Controls Will Solve the Dual-Use Dilemma

Task Assignment and Exploration Optimization for Low Altitude UAV Rescue via Generative AI Enhanced Multi-agent Reinforcement Learning

Affordable AI Assistants with Knowledge Graph of Thoughts

Deontic Temporal Logic for Formal Verification of AI Ethics

Multi-Agent Pathfinding Under Team-Connected Communication Constraint via Adaptive Path Expansion and Dynamic Leading

Constrain Alignment with Sparse Autoencoders

Multi-modal Generative AI: Multi-modal LLMs, Diffusions and the Unification

SimSUM: Simulated Benchmark with Structured and Unstructured Medical Records

Solving a Stackelberg Game on Transportation Networks in a Dynamic Crime Scenario: A Mixed Approach on Multi-Layer Networks

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

PyVision: Agentic Vision with Dynamic Tooling

Single-pass Adaptive Image Tokenization for Minimum Program Search

Multigranular Evaluation for Brain Visual Decoding

Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs

EXPO: Stable Reinforcement Learning with Expressive Policies

Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Why is Your Language Model a Poor Implicit Reward Model?

Reinforcement Learning with Action Chunking

Scaling RL to Long Videos

MIRIX: Multi-Agent Memory System for LLM-Based Agents

Low Resource Reconstruction Attacks Through Benign Prompts

Probing Experts' Perspectives on AI-Assisted Public Speaking Training

Towards Continuous Home Cage Monitoring: An Evaluation of Tracking and Identification Strategies for Laboratory Mice

DTECT: Dynamic Topic Explorer & Context Tracker

Agentic Retrieval of Topics and Insights from Earnings Calls

Single-pass Adaptive Image Tokenization for Minimum Program Search

Created by

Haebom

Author

Shivam Duggal, Sanghyun Byun, William T. Freeman, Antonio Torralba, Phillip Isola

Outline

In this paper, we propose a single-pass adaptive tokenizer, KARL, which performs variable-length tokenization according to the complexity of an image based on the principles of Algorithmic Information Theory (AIT). KARL uses a learning process similar to the inverse reinforcement learning paradigm by approximating the Kolmogorov complexity (KC) and stopping token generation when the minimum description length is reached. Unlike conventional adaptive tokenizers that require multiple encoding searches, KARL achieves the same performance in a single pass. In addition, we analyze the scaling law for factors such as encoder/decoder size, continuous/discrete tokenization, etc., and explore the relationship between image complexity (KC) and structure/noise, and in/out of distribution familiarity through a conceptual study between adaptive image tokenization and AIT, showing its consistency with human intuition.

Takeaways, Limitations

•

Takeaways:

◦

We present the possibility of more efficient image tokenization than existing methods via a single-pass adaptive tokenizer.

◦

Providing a new perspective on image understanding by measuring and analyzing image complexity using Kolmogorov complexity.

◦

Provides insight into optimizing model performance by presenting scaling laws for factors such as encoder/decoder size, tokenization method, etc.

◦

Confirming consistency with human intuition through analysis of the relationship between image complexity and structure/noise, and familiarity within/outside the distribution.

•

Limitations:

◦

There may be differences from the actual KC because an approximation of Kolmogorov complexity is used.

◦

Further validation is needed to see how well the proposed KARL's performance generalizes to various image datasets and tasks.

◦

Further analysis is needed on the complexity and stability of the learning process based on inverse reinforcement learning.

◦

Lack of information on specific experimental results and comparison models.

View PDF

Made with Slashpage