haebom
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Language Models are Injective and Hence Invertible
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Latent Diffusion Model without Variational Autoencoder
Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
STANCE: Motion Coherent Video Generation Via Sparse-to-Dense Anchored Encoding
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering
Beyond One World: Benchmarking Super Heros in Role-Playing Across Multiversal Contexts
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations
Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs
ConsintBench: Evaluating Language Models on Real-World Consumer Intent Understanding
Max It or Miss It: Benchmarking LLM On Solving Extremal Problems
Phenome-Wide Multi-Omics Integration Uncovers Distinct Archetypes of Human Aging
When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models
The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers
A Vision for Access Control in LLM-based Agent Systems
Audit-of-Understanding: Posterior-Constrained Inference for Mathematical Reasoning in Language Models
Formally Verified Certification of Unsolvability of Temporal Planning Problems
DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction
MSDM: Generating Task-Specific Pathology Images with a Multimodal Conditioned Diffusion Model for Cell and Nuclei Segmentation
Synthetic Series-Symbol Data Generation for Time Series Foundation Models
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy
A New Digital Divide? Coder Worldviews, the Slop Economy, and Democracy in the Age of AI
Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs
Creative synthesis of kinematic mechanisms
Market-Driven Subset Selection for Budgeted Training
Mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations
A Comparison of Independent and Joint Fine-tuning Strategies for Retrieval-Augmented Generation
TimeEmb: A Lightweight Static-Dynamic Disentanglement Framework for Time Series Forecasting
Learning Generalizable Shape Completion with SIM(3) Equivariance
Dolphin v1.0 Technical Report
A Measurement Study of Model Context Protocol Ecosystem
Diffusion Models are Kelly Gamblers
RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
Semantic Representation Attack against Aligned Large Language Models
Chiplet-Based RISC-V SoC with Modular AI Acceleration
Accurate and Efficient Low-Rank Model Merging in Core Space
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
Graph Coloring for Multi-Task Learning
Robust LLM Training Infrastructure at ByteDance
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
Communications to Circulations: Real-Time 3D Wind Field Prediction Using 5G GNSS Signals and Deep Learning
Why and How Auxiliary Tasks Improve JEPA Representations
Creativity Benchmark: A benchmark for marketing creativity for large language models
SpikingBrain: Spiking Brain-inspired Large Models
Robust Pan-Cancer Mitotic Figure Detection with YOLOv12
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
FlowDet: Overcoming Perspective and Scale Challenges in Real-Time End-to-End Traffic Detection
Epistemic Trade-Off: An Analysis of the Operational Breakdown and Ontological Limits of "Certainty-Scope" in AI
ZeST: an LLM ベースの Zero-Shot Traversability Navigation for Unknown Environments
Interpretable Decision-Making for End-to-End Autonomous Driving
A Systematic Approach to Predict the Impact of Cybersecurity Vulnerabilities Using LLMs
Limitations of Normalization in Attention Mechanism
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
The GPT-4o Shock Emotional Attachment to AI Models and Its Impact on Regulatory Acceptance: A Cross-Cultural Analysis of the Immediate Transition from GPT-4o to GPT-5
CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
SegDAC: Improving Visual Reinforcement Learning by Extracting Dynamic Objectc-Centric Representations from Pretrained Vision Models
VGGSounder: Audio-Visual Evaluations for Foundation Models
Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches
CAPO: Towards Enhancing LLM Reasoning through Generative Credit Assignment
FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models
SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches
A Multi-Stage Hybrid CNN-Transformer Network for Automated Pediatric Lung Sound Classification
From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models
ReDi: Rectified Discrete Flow
Adaptive Policy Synchronization for Scalable Reinforcement Learning
From Sequence to Structure: Uncovering Substructure Reasoning in Transformers
Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation
Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
AI-Generated Video Detection via Perceptual Straightening
From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging
Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning
ADA-DPM: A Neural Descriptors-based Adaptive Noise Filtering Strategy for SLAM
GeNIE: A Generalizable Navigation System for In-the-Wild Environments
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
Code Execution as Grounded Supervision for LLM Reasoning
Denoising the Future: Top-p Distributions for Moving Through Time
HauntAttack: When Attack Follows Reasoning as a Shadow
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
VERINA: Benchmarking Verifiable Code Generation
RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation
The quest for the GRAph Level autoEncoder (GRALE)
Efficient Large Language Model Inference with Neural Block Linearization
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Load more
Semantic Representation Attack against Aligned Large Language Models
Created by
Haebom
作者
Jiawei Lian, Jianhong Pan, Lefan Wang, Yi Wang, Shaohui Mei, Lap-Pui Chau
概要
ソートされた大規模言語モデル(LLM)の有害な出力を生成するように促すプロンプトを作成する攻撃は、LLMの安全装置をバイパスする可能性があります。従来の攻撃方式は正確な肯定応答を目指し、制限的な収束、不自然なプロンプト、高い計算コストなどの欠点を示します。この論文では、セマンティック表現攻撃と呼ばれる新しいパラダイムを提案します。これは、正確なテキストパターンの代わりに、同じ有害な意味を持つさまざまな応答をカバーする意味表現空間を利用します。さらに、意味論的一貫性と簡潔性を維持しながら、効率的に敵対的なプロンプトを生成するために解釈可能性を維持する意味表現ヒューリスティック検索アルゴリズムを提案する。実験の結果、提案された方法は、前例のない攻撃成功率(18個のLLMで平均89.41%、11個のモデルで100%)を達成しながら、秘密性と効率性を維持することを示しました。
Takeaways、Limitations
•
Takeaways:
◦
既存攻撃方式の限界を克服し、LLMの安全装置を迂回する新たな攻撃方法を提示
◦
意味表現空間を活用して攻撃成功率を大幅に向上
◦
解釈可能性を維持し、効率的な敵対的なプロンプトを生成
◦
様々なLLMで高い攻撃成功率を示し,方法論の一般的適用性を証明
•
Limitations:
◦
コード公開予定だが、これまでは具体的な実装方法についての情報不足
◦
実験に使用したLLMの種類と詳細な特性に関する情報不足
◦
攻撃に対する防御技術に関する議論の欠如
PDFを見る
Made with Slashpage