haebom
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Language Models are Injective and Hence Invertible
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Latent Diffusion Model without Variational Autoencoder
Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering
Beyond One World: Benchmarking Super Heros in Role-Playing Across Multiversal Contexts
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations
Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs
ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding
Max It or Miss It: Benchmarking LLM On Solving Extremal Problems
Phenome-Wide Multi-Omics Integration Uncovers Distinct Archetypes of Human Aging
When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models
The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers
A Vision for Access Control in LLM-based Agent Systems
Audit-of-Understanding: Posterior-Constrained Inference for Mathematical Reasoning in Language Models
Formally Verified Certification of Unsolvability of Temporal Planning Problems
DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction
MSDM: Generating Task-Specific Pathology Images with a Multimodal Conditioned Diffusion Model for Cell and Nuclei Segmentation
Synthetic Series-Symbol Data Generation for Time Series Foundation Models
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy
A New Digital Divide? Coder Worldviews, the Slop Economy, and Democracy in the Age of AI
Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs
Creative synthesis of kinematic mechanisms
Market-Driven Subset Selection for Budgeted Training
Mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations
A Comparison of Independent and Joint Fine-tuning Strategies for Retrieval-Augmented Generation
TimeEmb: A Lightweight Static-Dynamic Disentanglement Framework for Time Series Forecasting
Learning Generalizable Shape Completion with SIM(3) Equivariance
Dolphin v1.0 Technical Report
A Measurement Study of Model Context Protocol Ecosystem
Diffusion Models are Kelly Gamblers
RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
Semantic Representation Attack against Aligned Large Language Models
Chiplet-Based RISC-V SoC with Modular AI Acceleration
Accurate and Efficient Low-Rank Model Merging in Core Space
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
Graph Coloring for Multi-Task Learning
Robust LLM Training Infrastructure at ByteDance
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
Communications to Circulations: Real-Time 3D Wind Field Prediction Using 5G GNSS Signals and Deep Learning
Why and How Auxiliary Tasks Improve JEPA Representations
Creativity Benchmark: A benchmark for marketing creativity for large language models
SpikingBrain: Spiking Brain-inspired Large Models
Robust Pan-Cancer Mitotic Figure Detection with YOLOv12
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection
Epistemic Trade-Off: An Analysis of the Operational Breakdown and Ontological Limits of "Certainty-Scope" in AI
ZeST: an LLM ベースの Zero-Shot Traversability Navigation for Unknown Environments
Interpretable Decision-Making for End-to-End Autonomous Driving
A Systematic Approach to Predict the Impact of Cybersecurity Vulnerabilities Using LLMs
Limitations of Normalization in Attention Mechanism
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
The GPT-4o Shock Emotional Attachment to AI Models and Its Impact on Regulatory Acceptance: A Cross-Cultural Analysis of the Immediate Transition from GPT-4o to GPT-5
CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
SegDAC: Improving Visual Reinforcement Learning by Extracting Dynamic Objectc-Centric Representations from Pretrained Vision Models
VGGSounder: Audio-Visual Evaluations for Foundation Models
Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches
CAPO: Towards Enhancing LLM Reasoning through Generative Credit Assignment
FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models
SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches
A Multi-Stage Hybrid CNN-Transformer Network for Automated Pediatric Lung Sound Classification
From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models
ReDi: Rectified Discrete Flow
Adaptive Policy Synchronization for Scalable Reinforcement Learning
From Sequence to Structure: Uncovering Substructure Reasoning in Transformers
Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation
Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
AI-Generated Video Detection via Perceptual Straightening
From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging
Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning
ADA-DPM: A Neural Descriptors-based Adaptive Noise Filtering Strategy for SLAM
GeNIE: A Generalizable Navigation System for In-the-Wild Environments
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
Code Execution as Grounded Supervision for LLM Reasoning
Denoising the Future: Top-p Distributions for Moving Through Time
HauntAttack: When Attack Follows Reasoning as a Shadow
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
VERINA: Benchmarking Verifiable Code Generation
RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation
The quest for the GRAph Level autoEncoder (GRALE)
Efficient Large Language Model Inference with Neural Block Linearization
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Load more
HauntAttack: When Attack Follows Reasoning as a Shadow
Created by
Haebom
作者
Jingyuan Ma, Rui Li, Zheng Li, Junfeng Liu, Lei Sha, Zhifang Sui
概要
Emerging Large Reasoning Models (LRMs) は数学と推論作業で優れた性能を発揮しますが、推論能力の向上と内部推論プロセスの暴露は新しい安全性の脆弱性を引き起こします。本論文は、これらのLRMが有害性に関連している場合、推論モードでジャイルブレークに対してより脆弱になるかどうかを調べます。 HauntAttackという新しいブラックボックス敵対攻撃フレームワークを導入し、有害なガイダンスを推論質問に体系的に挿入する。既存の質問の重要な推論条件を有害なガイダンスに変更し、モデルが有害な出力に向くように段階的に導く推論経路を構築します。 11個のLRMに対する評価の結果、HauntAttackは平均70%の攻撃成功率を示し、既存最強のベースラインより最大12%pの絶対的な性能向上を達成した。安全アライメントモデルでさえ、推論ベースの攻撃に対して非常に脆弱であり、これは将来のモデル開発における推論能力と安全性のバランスをとる緊急の課題を提示する。
Takeaways、Limitations
•
Takeaways:
◦
LRMの推論能力の向上は安全の脆弱性を高める可能性があります。
◦
HauntAttackは、LRMの安全性をテストする効果的なブラックボックス攻撃フレームワークです。
◦
安全アライメントモデルも推論ベースの攻撃に対して脆弱です。
◦
将来モデルの開発において、推論能力と安全性のバランスが重要である。
•
Limitations:
◦
本論文に具体的なLimitationsは記載されていない。
PDFを見る
Made with Slashpage