Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

(R)evolution of Programming: Vibe Coding as a Post-Coding Paradigm

Y-shaped Generative Flows

Trustworthy Retrosynthesis: Eliminating Hallucinations with a Diverse Ensemble of Reaction Scorers

The Algorithmic Regulator

Ctrl-World: A Controllable Generative World Model for Robot Manipulation

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

Ultralytics YOLO Evolution: An Overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 Object Detectors for Computer Vision and Pattern Recognition

Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

H1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning

SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models

HybridFlow: Quantification of Aleatoric and Epistemic Uncertainty with a Single Hybrid Model

Detecting Distillation Data from Reasoning Models

Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing

On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations

Learning Inter-Atomic Potentials without Explicit Equivariance

Towards A Universally Transferable Acceleration Method for Density Functional Theory

Functional Critic Modeling for Provably Convergent Off-Policy Actor-Critic

Variational Reasoning for Language Models

Learning Equivariant Functions via Quadratic Forms

Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning

Defending against Stegomalware in Deep Neural Networks with Permutation Symmetry

LibEMER: A novel benchmark and algorithms library for EEG-based Multimodal Emotion Recognition

Self-Evolving LLMs via Continual Instruction Tuning

Can an Individual Manipulate the Collective Decisions of Multi-Agents?

CAGE: Continuity-Aware edGE Network Unlocks Robust Floorplan Reconstruction

Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection

Visible Yet Unreadable: A Systematic Blind Spot of Vision Language Models Across Writing Systems

Towards Methane Detection Onboard Satellites

EO-1: Interleaved Vision-Text-Action Pretraining for General Robot Control

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Reliable generation of isomorphic physics problems using Generative AI with prompt-chaining and tool use

Hierarchical Evaluation Function: A Multi-Metric Approach for Optimizing Demand Forecasting Models

Geometry-Aware Global Feature Aggregation for Real-Time Indirect Illumination

Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches

Your AI, Not Your View: The Bias of LLMs in Investment Analysis

DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning

Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations

Early Signs of Steganographic Capabilities in Frontier LLMs

Orthogonal Finetuning Made Scalable

LLM Probability Concentration: How Alignment Shrinks the Generative Horizon

A Brain-to-Population Graph Learning Framework for Diagnosing Brain Disorders

Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles

PAL: Probing Audio Encoders via LLMs - Audio Information Transfer into LLMs

Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information

Superior Molecular Representations from Intermediate Encoder Layers

FLEX: A Largescale Multimodal, Multiview Dataset for Learning Structured Representations for Fitness Action Quality Assessment

The quest for the GRAph Level autoEncoder (GRALE)

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Multi-Scale Probabilistic Generation Theory: A Unified Information-Theoretic Framework for Hierarchical Structure in Large Language Models

ReasoningShield: Safety Detection over Reasoning Traces of Large Reasoning Models

R$^2$ec: Towards Large Recommender Models with Reasoning

Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning

Flattening Hierarchies with Policy Bootstrapping

FineScope: Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation

MIRROR: Multimodal Cognitive Reframing Therapy for Rolling with Resistance

Statistical post-processing yields accurate probabilistic forecasts from Artificial Intelligence weather models

TMT: Cross-domain Semantic Segmentation with Region-adaptive Transferability Estimation

On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation

A Personalized Data-Driven Generative Model of Human Repetitive Motion

Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding

PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection

FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model

Position: The Artificial Intelligence and Machine Learning Community Should Adopt a More Transparent and Regulated Peer Review Process

BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery

CSI-BERT2: A BERT-inspired Framework for Efficient CSI Prediction and Classification in Wireless Communication and Sensing

SoundnessBench: A Soundness Benchmark for Neural Network Verifiers

Semantically Guided Action Anticipation

On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse

Reliable Decision Making via Calibration Oriented Retrieval Augmented Generation

A Risk Taxonomy and Reflection Tool for Large Language Model Adoption in Public Health

ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom

Optimal Quantization for Matrix Multiplication

Temporal-Difference Variational Continual Learning

Hi-Drive: Hierarchical POMDP Planning for Safe Autonomous Driving in Diverse Urban Environments

Beyond Visual Appearances: Privacy-sensitive Objects Identification via Hybrid Graph Reasoning

Extreme Compression of Adaptive Neural Images

A Comprehensive Survey on Data Augmentation

Do LLM Agents Have Regrets? A Case Study in Online Learning and Games

MULTI: Multimodal Understanding Leaderboard with Text and Images

Nash Equilibria, Regularization and Computation in Optimal Transport-Based Distributionally Robust Optimization

When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games

Tensor Logic: The Language of AI

Do Large Language Models Respect Contracts? Evaluating and Enforcing Contract-Adherence in Code Generation

Benchmarking is Broken -- Don't Let AI be its Own Judge

A Tale of LLMs and Induced Small Proxies: Scalable Agents for Knowledge Mining

Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline

LLM/Agent-as-Data-Analyst: A Survey

SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents

Coordination Requires Simplification: Thermodynamic Bounds on Multi-Objective Compromise in Natural and Artificial Intelligence

FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games

HealthProcessAI: A Technical Framework and Proof-of-Concept for LLM-Enhanced Healthcare Process Mining

TASER: Table Agents for Schema-guided Extraction and Recommendation

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

Created by

Haebom

Author

Yichi Zhang, Yue Ding, Jingwen Yang, Tianwei Luo, Dongbai Li, Ranjie Duan, Qiang Liu, Hang Su, Yinpeng Dong, Jun Zhu

Outline

Despite advances in the complex problem-solving capabilities of large-scale reasoning models (LRMs), this paper highlights the potential for harmful content to be included in the Chain of Terror (CoT) inference process, persisting even when the final response appears safe. This paper highlights the potential for harmful content to be included in existing methods that overlook the importance of safe inference, as well as the potential risks associated with exposure to malicious users. We focus on aligning safe inference itself. To this end, we analyze the characteristics of safe inference and identify the importance of safety triggers, compliance signals, and corrective interventions. We propose a novel alignment method, Intervention Preference Optimization (IPO), which enhances safe inference by replacing compliance steps with safety triggers and constructing pairs for preference learning. Experimental results on jailbreak and adversarial safety benchmarks demonstrate that IPO significantly improves overall safety in both inference and response, reducing harmful content by more than 30% compared to SFT and RL-based models, while maintaining superior performance across a variety of inference tasks.

Takeaways, Limitations

•

Takeaways:

◦

It emphasizes the importance of aligning the safe inference of LRM itself.

◦

Identify and utilize key elements of safe reasoning, including safety triggers, compliance signals, and corrective interventions.

◦

We have significantly improved safety by proposing a new sorting method called IPO.

◦

The effectiveness of IPOs has been proven through various benchmark experiments.

•

Limitations:

◦

Additional research may be needed into specific safety triggers, compliance signals, and corrective intervention methodologies.

◦

The generalizability of IPO to other types of harmful content or attacks needs to be further validated.

◦

Further analysis of the computational complexity and efficiency of IPOs is needed.

View PDF

Made with Slashpage