Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

(R)evolution of Programming: Vibe Coding as a Post-Coding Paradigm

Y-shaped Generative Flows

Trustworthy Retrosynthesis: Eliminating Hallucinations with a Diverse Ensemble of Reaction Scorers

The Algorithmic Regulator

Ctrl-World: A Controllable Generative World Model for Robot Manipulation

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

Ultralytics YOLO Evolution: An Overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 Object Detectors for Computer Vision and Pattern Recognition

Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

H1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning

SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models

HybridFlow: Quantification of Aleatoric and Epistemic Uncertainty with a Single Hybrid Model

Detecting Distillation Data from Reasoning Models

Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing

On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations

Learning Inter-Atomic Potentials without Explicit Equivariance

Towards A Universally Transferable Acceleration Method for Density Functional Theory

Functional Critic Modeling for Provably Convergent Off-Policy Actor-Critic

Variational Reasoning for Language Models

Learning Equivariant Functions via Quadratic Forms

Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning

Defending against Stegomalware in Deep Neural Networks with Permutation Symmetry

LibEMER: A novel benchmark and algorithms library for EEG-based Multimodal Emotion Recognition

Self-Evolving LLMs via Continual Instruction Tuning

Can an Individual Manipulate the Collective Decisions of Multi-Agents?

CAGE: Continuity-Aware edGE Network Unlocks Robust Floorplan Reconstruction

Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection

Visible Yet Unreadable: A Systematic Blind Spot of Vision Language Models Across Writing Systems

Towards Methane Detection Onboard Satellites

EO-1: Interleaved Vision-Text-Action Pretraining for General Robot Control

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Reliable generation of isomorphic physics problems using Generative AI with prompt-chaining and tool use

Hierarchical Evaluation Function: A Multi-Metric Approach for Optimizing Demand Forecasting Models

Geometry-Aware Global Feature Aggregation for Real-Time Indirect Illumination

Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches

Your AI, Not Your View: The Bias of LLMs in Investment Analysis

DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning

Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations

Early Signs of Steganographic Capabilities in Frontier LLMs

Orthogonal Finetuning Made Scalable

LLM Probability Concentration: How Alignment Shrinks the Generative Horizon

A Brain-to-Population Graph Learning Framework for Diagnosing Brain Disorders

Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles

PAL: Probing Audio Encoders via LLMs - Audio Information Transfer into LLMs

Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information

Superior Molecular Representations from Intermediate Encoder Layers

FLEX: A Largescale Multimodal, Multiview Dataset for Learning Structured Representations for Fitness Action Quality Assessment

The quest for the GRAph Level autoEncoder (GRALE)

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Multi-Scale Probabilistic Generation Theory: A Unified Information-Theoretic Framework for Hierarchical Structure in Large Language Models

ReasoningShield: Safety Detection over Reasoning Traces of Large Reasoning Models

R$^2$ec: Towards Large Recommender Models with Reasoning

Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning

Flattening Hierarchies with Policy Bootstrapping

FineScope: Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation

MIRROR: Multimodal Cognitive Reframing Therapy for Rolling with Resistance

Statistical post-processing yields accurate probabilistic forecasts from Artificial Intelligence weather models

TMT: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation

On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation

A Personalized Data-Driven Generative Model of Human Repetitive Motion

Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding

PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection

FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model

Position: The Artificial Intelligence and Machine Learning Community Should Adopt a More Transparent and Regulated Peer Review Process

BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery

CSI-BERT2: A BERT-inspired Framework for Efficient CSI Prediction and Classification in Wireless Communication and Sensing

SoundnessBench: A Soundness Benchmark for Neural Network Verifiers

Semantically Guided Action Anticipation

On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse

Reliable Decision Making via Calibration Oriented Retrieval Augmented Generation

A Risk Taxonomy and Reflection Tool for Large Language Model Adoption in Public Health

ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom

Optimal Quantization for Matrix Multiplication

Temporal-Difference Variational Continual Learning

Hi-Drive: Hierarchical POMDP Planning for Safe Autonomous Driving in Diverse Urban Environments

Beyond Visual Appearances: Privacy-sensitive Objects Identification via Hybrid Graph Reasoning

Extreme Compression of Adaptive Neural Images

A Comprehensive Survey on Data Augmentation

Do LLM Agents Have Regrets? A Case Study in Online Learning and Games

MULTI: Multimodal Understanding Leaderboard with Text and Images

Nash Equilibria, Regularization and Computation in Optimal Transport-Based Distributionally Robust Optimization

When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games

Tensor Logic: The Language of AI

Do Large Language Models Respect Contracts? Evaluating and Enforcing Contract-Adherence in Code Generation

Benchmarking is Broken -- Don't Let AI be its Own Judge

A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining

Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline

LLM/Agent-as-Data-Analyst: A Survey

SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents

Coordination Requires Simplification: Thermodynamic Bounds on Multi-Objective Compromise in Natural and Artificial Intelligence

FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games

HealthProcessAI: A Technical Framework and Proof-of-Concept for LLM-Enhanced Healthcare Process Mining

TASER: Table Agents for Schema-guided Extraction and Recommendation

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

Created by

Haebom

Author

Dongsheng Ding, Kaiqing Zhang, Jiali Duan, Tamer Ba\c{s}ar, Mihailo R. Jovanovi c

Outline

This paper studies a sequential decision-making problem in a constrained Markov decision process (MDP) that maximizes expected total reward while satisfying constraints on expected total utility. To solve the infinite horizontal discounting optimal control problem using the natural policy gradient method, we propose the Natural Policy Gradient Primal-Dual (NPG-PD) method. This method updates the primal variable via natural policy gradient ascent and the dual variable via projected subgradient descent. We demonstrate that the proposed method converges globally at a sublinear rate under softmax policy parameterization, despite the non-objective function and nonconvex constraint set for the maximization problem. This convergence is independent of the size of the state-action space, and for log-linear and general smooth policy parameterizations, the sublinear convergence rate is established even when considering the function approximation error due to the restricted policy parameterization. Additionally, we provide convergence and finite sample complexity guarantees for two sample-based NPG-PD algorithms, and demonstrate the effectiveness of our approach through computational experiments.

Takeaways, Limitations

•

Takeaways:

◦

An Effective Solution to Constrained MDP Problems: Leveraging Natural Policy Gradient Methods to Solve Constrained Sequential Decision Problems.

◦

Global convergence guarantee: Ensures sublinear convergence under optimality gaps and constraint violations under softmax policy parameterization.

◦

Dimensionality independence: Achieve convergence that is independent of the size of the state-action space.

◦

Support for various policy parameterizations: Provides convergence speeds for log-linear and general smooth policy parameterizations.

◦

Sample-based algorithm analysis: Provides convergence and finite sample complexity guarantees for the sample-based NPG-PD algorithm.

◦

Experimental Validation: The effectiveness of the proposed method is demonstrated through experiments.

•

Limitations:

◦

Policy parameterization dependence: Convergence speed may vary depending on the specific policy parameterization (softmax, log-linear, smooth policy).

◦

Function approximation error: For general smooth policy parameterization, function approximation error may occur due to restricted policy parameterization.

View PDF

Made with Slashpage