Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Failure Prediction at Runtime for Generative Robot Policies

ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

Enhancing Self-Supervised Learning with Semantic Pairs A New Dataset and Empirical Study

DInfer: An Efficient Inference Framework for Diffusion Language Models

Formalizing Style in Personal Narratives

Active Confusion Expression in Large Language Models: Leveraging World Models toward Better Social Reasoning

Local MAP Sampling for Diffusion Models

HTMformer: Hybrid Time and Multivariate Transformer for Time Series Forecasting

Native Hybrid Attention for Efficient Sequence Modeling

FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline

Leveraging Large Language Models for Cybersecurity Risk Assessment -- A Case from Forestry Cyber-Physical Systems

TRepLiNa: Layer-wise CKA+REPINA Alignment Improves Low-Resource Machine Translation in Aya-23 8B

Probing the Difficulty Perception Mechanism of Large Language Models

Logistic-Gated Operators Enable Auditable Unit-Aware Thresholds in Symbolic Regression

MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation

FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration

Curved Boolean Logic: A Contextual Generalization of Propositional Logic with Algorithmic Consequences

LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning

Best of mini-N in-loop Sampling: A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling

Diverse Text-to-Image Generation via Contrastive Noise Optimization

PDE-Transformer: A Continuous Dynamical Systems Approach to Sequence Modeling

HAVIR: Hierarchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion

Fusing Multi- and Hyperspectral Satellite Data for Harmful Algal Bloom Monitoring with Self-Supervised and Hierarchical Deep Learning

Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models

SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence

Learning to Reason as Action Abstractions with Scalable Mid-Training RL

How Effective Are Time-Series Models for Rainfall Nowcasting? A Comprehensive Benchmark for Rainfall Nowcasting Incorporating PWV Data

Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models

LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation

The 2025 OpenAI Preparedness Framework does not guarantee any AI risk mitigation practices: a proof-of-concept for affordance analyzes of AI safety policies

Navigating the Labyrinth: Path-Sensitive Unit Test Generation with Large Language Models

Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality

Multi-Modal Manipulation via Multi-Modal Policy Consensus

Graph Your Own Prompt

Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm

ConQuER: Modular Architectures for Control and Bias Mitigation in IQP Quantum Generative Models

Question-Driven Analysis and Synthesis: Building Interpretable Thematic Trees with LLMs for Text Clustering and Controllable Generation

Tree Search for LLM Agent Reinforcement Learning

Part-of-speech tagging for Nagamese Language using CRF

Prompt Optimization Meets Subspace Representation Learning for Few-shot Out-of-Distribution Detection

Distribution-Aligned Decoding for Efficient LLM Task Adaptation

ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy

Towards Secure and Explainable Smart Contract Generation with Security-Aware Group Relative Policy Optimization

Clip Your Sequences Fairly: Enforcing Length Fairness for Sequence-Level RL

VL Norm: Rethink Loss Aggregation in RLVR

Long-Range Graph Wavelet Networks

Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate

SPFT-SQL: Enhancing Large Language Model for Text-to-SQL Parsing by Self-Play Fine-Tuning

VendiRL: A Framework for Self-Supervised Reinforcement Learning of Diversely Diverse Skills

DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off

REFRAG: Rethinking RAG based Decoding

Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization and Role Prompting

NSPDI-SNN: An efficient lightweight SNN based on nonlinear synaptic pruning and dendritic integration

TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models

Tailored Teaching with Balanced Difficulty: Elevating Reasoning in Multimodal Chain-of-Thought via Prompt Curriculum

A Synthetic Dataset for Manometry Recognition in Robotic Applications

Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation

Autonomous UAV Flight Navigation in Confined Spaces: A Reinforcement Learning Approach

Generative artificial intelligence improves projections of climate extremes

MCPVerse: An Expansive, Real-World Benchmark for Agentic Tool Use

A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning

MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

Efficient Approximate Posterior Sampling with Annealed Langevin Monte Carlo

Agentic large language models improve retrieval-based radiology question answering

Vision-Language Cross-Attention for Real-Time Autonomous Driving

Goal-Based Vision-Language Driving

Efficient Compositional Multi-tasking for On-device Large Language Models

Beyond Rate Coding: Surrogate Gradients Enable Spike Timing Learning in Spiking Neural Networks

{S\textsuperscript{2}M\textsuperscript{2}}: Scalable Stereo Matching Model for Reliable Depth Estimation

Learning Representations of Event Time Series with Sparse Autoencoders for Anomaly Detection, Similarity Search, and Unsupervised Classification

A Lightweight and Robust Framework for Real-Time Colorectal Polyp Detection Using LOF-Based Preprocessing and YOLO-v11n

RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services

Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models

Learning Diffusion Models with Flexible Representation Guidance

Simulating Three-dimensional Turbulence with Physics-informed Neural Networks

Train-before-Test Harmonizes Language Model Rankings

MLLM-Fabric: Multimodal Large Language Model-Driven Robotic Framework for Fabric Sorting and Selection

GradMetaNet: An Equivariant Architecture for Learning on Gradients

PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning

Theoretical Modeling of LLM Self-Improvement Training Dynamics Through Solver-Verifier Gap

The Hidden Link Between RLHF and Contrastive Learning

ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction

Doc2SAR: A Synergistic Framework for High-Fidelity Extraction of Structure-Activity Relationships from Scientific Documents

On Convolutions, Intrinsic Dimensions, and Diffusion Models

Structured Kolmogorov-Arnold Neural ODEs for Interpretable Learning and Symbolic Discovery of Nonlinear Dynamics

SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification

Ignition Phase: Standard Training for Fast Adversarial Robustness

StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery

Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot

Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity

VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision-Language Models

Revisit What You See: Disclose Language Prior in Vision Tokens for LVLM Decoding

LLM-as-a-qualitative-judge: automating error analysis in natural language generation

From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks

Monotone and Conservative Policy Iteration Beyond the Tabular Case

SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification

Created by

Haebom

Author

Zhenglin Lai, Mengyao Liao, Bingzhe Wu, Dong Xu, Zebin Zhao, Zhihang Yuan, Chao Fan, Jianqiang Li

Outline

We address the problem of safety alignment for large-scale language models with a Mixture-of-Experts (MoE) architecture. Specifically, we formalize and analyze the "positional vulnerability" where the safety-related behavior of an MoE model depends on specific expert modules. We present an analytical framework, called SAFEx, to identify, characterize, and validate safety-critical experts, classifying them into Harmful Content Detection Groups (HCDGs) and Harmful Response Control Groups (HRCGs). We investigate causality and test mitigation strategies using expert-level interventions. We demonstrate that blocking SAFEx-selected experts significantly impacts safety behavior for the Qwen3-30B-A3B model. Furthermore, we use LoRA to perform lightweight adaptation targeting HRCGs and improve the rejection rate for adversarial prompts without full model retraining through negative weight merging.

Takeaways, Limitations

•

Takeaways:

◦

We defined and analyzed specific issues (location vulnerabilities) to ensure the safety of the MoE model.

◦

We present a method for effectively identifying and classifying safety professionals through the SAFEx framework.

◦

We propose a practical approach to improve the security of the MoE model through expert-level interventions (masking, LoRA-based adaptation).

◦

We present a computationally efficient, expert-level safety intervention pathway.

•

Limitations:

◦

Only experimental results for a specific model (Qwen3-30B-A3B) and setup (top-8 routing) are presented, so generalization may be limited.

◦

SAFEx performance may vary depending on model architecture and data.

◦

The effectiveness of LoRA-based adaptation is focused only on HRCG, and its impact on other safety-related aspects requires further study.

View PDF

Made with Slashpage