/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
Ada-TransGNN: An Air Quality Prediction Model Based On Adaptive Graph Convolutional Networks
Unlearning as Ablation: Toward a Falsifiable Benchmark for Generative Scientific Discovery
Consistent Opponent Modeling of Static Opponents in Imperfect-Information Games
Finding Outliers in a Haystack: Anomaly Detection for Large Pointcloud Scenes
Agentic AI for Software: thoughts from Software Engineering community
Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
Dream to Chat: Model-based Reinforcement Learning on Dialogues with User Belief Modeling
A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems
Generative Artificial Intelligence and Agents in Research and Teaching
CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression
Comparative Analysis of UAV Path Planning Algorithms for Efficient Navigation in Urban 3D Environments
Retrieval Enhanced Feedback via In-context Neural Error-book
From Confidence to Collapse in LLM Factual Robustness
On Task Vectors and Gradients
Learning in Repeated Multi-Objective Stackelberg Games with Payoff Manipulation
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
DLLMQuant: Quantizing Diffusion-based Large Language Models
LLM-Enhanced Linear Autoencoders for Recommendation
Leveraging GNN to Enhance MEF Method in Predicting ENSO
Uncertainty-Guided Face Matting for Occlusion-Aware Face Transformation
New Kid in the Classroom: Exploring Student Perceptions of AI Coding Assistants
Large Language Model-Based Framework for Explainable Cyberattack Detection in Automatic Generation Control Systems
SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs
Apple Intelligence Foundation Language Models: Tech Report 2025
SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models
Demographic-aware fine-grained classification of pediatric wrist fractures
Krul: Efficient State Restoration for Multi-turn Conversations with Dynamic Cross-layer KV Sharing
Solar Altitude Guided Scene Illumination
An Agentic System for Rare Disease Diagnosis with Traceable Reasoning
Spectra-to-Structure and Structure-to-Spectra Inference Across the Periodic Table
UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation
Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language Models
EVM-Fusion: An Explainable Vision Mamba Architecture with Neural Algorithmic Fusion
RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection
Revisiting SSL for sound event detection: complementary fusion and adaptive post-processing
Concept-Guided Interpretability via Neural Chunking
Unveiling the Landscape of LLM Deployment in the Wild: An Empirical Study
An Ontology-Driven Graph RAG for Legal Norms: A Hierarchical, Temporal, and Deterministic Approach
Prefill-level Jailbreak: A Black-Box Risk Analysis of Large Language Models
Video CLIP Model for Multi-View Echocardiography Interpretation
A Hybrid Fully Convolutional CNN-Transformer Model for Inherently Interpretable Disease Detection from Retinal Fundus Images
M$^2$IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering
Noise-based reward-modulated learning
Faster Parameter-Efficient Tuning with Token Redundancy Reduction
UniGenX: a unified generative foundation model that couples sequence, structure and function to accelerate scientific design across proteins, molecules and materials
Collaborative Evaluation of Deepfake Text with Deliberation-Enhancing Dialogue Systems
Large Language Models Badly Generalize across Option Length, Problem Types, and Irrelevant Noun Replacements
TableTalk: Scaffolding Spreadsheet Development with a Language Agent
StagFormer: Time Staggering Transformer Decoding for RunningLayers In Parallel
Provably-Safe Neural Network Training Using Hybrid Zonotope Reachability Analysis
Generative Artificial Intelligence-Supported Pentesting: A Comparison between Claude Opus, GPT-4, and Copilot
Safe Multiagent Coordination via Entropic Exploration
TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use
Cultural Dimensions of AI Perception: Charting Expectations, Risks, Benefits, Tradeoffs, and Value in Germany and China
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Perception Gaps in Risk, Benefit, and Value Between Experts and Public Challenge Socially Accepted AI
Hierarchical Object-Oriented POMDP Planning for Object Rearrangement
From Intents to Conversations: Generating Intent-Driven Dialogues with Contrastive Learning for Multi-Turn Classification
Secure Reinforcement Learning via Shuffle Privacy Model
Overcoming label shift with target-aware federated learning
Benchmarking XAI Explanations with Human-Aligned Evaluations
HonestCyberEval: An AI Cyber Risk Benchmark for Automated Software Exploitation
Leveraging Multi-facet Paths for Heterogeneous Graph Representation Learning
GeNet: A Multimodal LLM-Based Co-Pilot for Network Topology and Configuration
ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
Ego-Foresight: Self-supervised Learning of Agent-Aware Representations for Improved RL
Exploring the Robustness of Language Models for Tabular Question Answering via Attention Analysis
Learning county from pixels: corn yield prediction with attention-weighted multiple instance learning
Memory augment is All You Need for image restoration
Rethinking Distribution Shifts: Empirical Analysis and Inductive Modeling for Tabular Data
DiffBlender: Composable and Versatile Multimodal Text-to-Image Diffusion Models
Beyond Discriminant Patterns: On the Robustness of Decision Rule Ensembles
Bayesian Deep Learning for Segmentation for Autonomous Safe Planetary Landing
ST-Raptor: LLM-Powered Semi-Structured Table Question Answering
Route-and-Execute: Auditable Model-Card Matching and Specialty-Level Deployment
LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence
Response and Prompt Evaluation to Prevent Parasocial Relationships with Chatbots
Profile-Aware Maneuvering: A Dynamic Multi-Agent System for Robust GAIA Problem Solving by AWorld
Multi-Agent LLMs as Ethics Advocates for AI-Based Systems
Feature-Guided Neighbor Selection for Non-Expert Evaluation of Model Predictions
Architecting Clinical Collaboration: Multi-Agent Reasoning Systems for Multimodal Medical VQA
MRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models
The Influence of Human-inspired Agentic Sophistication in LLM-driven Strategic Reasoners
YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models
Consensus in Motion: A Case of Dynamic Rationality of Sequential Learning in Probability Aggregation
Can Large Language Models Act as Ensembler for Multi-GNNs?
Pessimistic Iterative Planning with RNNs for Robust POMDPs
Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding
Integrating Large Language Model for Improved Causal Discovery
A Survey on Causal Discovery: Theory and Practice
Generative Interfaces for Language Models
Interpolating Speaker Identities in Embedding Space for Data Expansion
VibeVoice Technical Report
LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding
Understanding Tool-Integrated Reasoning
Emotions as Ambiguity-aware Ordinal Representations
Real-Time Model Checking for Closed-Loop Robot Reactive Planning
Load more
Prefill-level Jailbreak: A Black-Box Risk Analysis of Large Language Models
Created by
Haebom
作者
Yakai Li, Jiekang Hu, Weiduan Sang, Luping Ma, Dongsheng Nie, Weijuan Zhang, Aimin Yu, Yi Su, Qingjia Huang, Qihang Zhou
概要
本論文は、大規模言語モデル(LLM)のセキュリティ脅威の1つである在日ブレイク攻撃について、既存の研究で主に取り上げられていたプロンプトレベルの攻撃ではなく、ユーザー制御応答の事前充填機能を悪用する攻撃に焦点を当てて研究した結果を提示します。プリフィルにより、攻撃者はモデル出力の先頭を操作して、説得ベースの攻撃からモデル状態直接操作に攻撃パラダイムを切り替えます。 14のLLMを対象にブラックボックスセキュリティ分析を行い、事前充填レベルの在日ブレーキ攻撃を分類し、その効果を評価しました。実験の結果、適応方法を使用した攻撃は、複数のモデルで99%を超える成功率を達成し、トークンレベルの確率分析により、初期状態操作によって拒否から協力への最初のトークン確率の変化が発生することが確認されました。また、事前充填レベルの在日ブレーキ攻撃は、既存のプロンプトレベル攻撃の成功率を10~15%p向上させる効果的な増強剤として機能することを示しています。いくつかの防御戦略評価の結果、既存のコンテンツフィルタは制限的な保護効果のみを提供し、プロンプトとプリフィルとの間の操作的関係に焦点を当てた検出方法がより効果的であることがわかりました。結論として、現在のLLM安全アライメントの脆弱性を明らかにし、将来の安全訓練で事前充填攻撃領域を解決する必要性を強調する。
Takeaways、Limitations
•
Takeaways:
◦
ユーザー制御応答プレフィル機能を使用した新しいタイプの在日ブレーキ攻撃の存在と重大性を明らかにします。
◦
プリフィル攻撃が既存のプロンプトベースの攻撃を増幅できることを示しています。
◦
既存のコンテンツフィルタの限界を明らかにし、プロンプトとプレフィルの間の関係に基づく新しい検出方法の必要性を提示する。
◦
LLMの安全性を向上させるための研究方向の提示(prefill攻撃対応)。
•
Limitations:
◦
分析対象モデルの種類と数の制限(14モデル)。
◦
提案された検出方法の一般化の可能性と実際の環境適用に関するさらなる研究の必要性
◦
さまざまな種類のプレフィル攻撃の包括的な分析が不足する可能性があります。
PDFを見る
Made with Slashpage