haebom
Daily Arxiv
전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios
SASG-DA: Sparse-Aware Semantic-Guided Diffusion Augmentation For Myoelectric Gesture Recognition
DiffRegCD: Integrated Registration and Change Detection with Diffusion Features
Advancing mathematics research with generative AI
TiS-TSL: Image-Label Supervised Surgical Video Stereo Matching via Time-Switchable Teacher-Student Learning
Quantifying Edits Decay in Fine-tuned LLMs
Self-adaptive weighting and sampling for physics-informed neural networks
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Prompt-Based Safety Guidance Is Ineffective for Unlearned Text-to-Image Diffusion Models
Reasoning Up the Instruction Ladder for Controllable Language Models
Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models
PISA-Bench: The PISA Index as a Multilingual and Multimodal Metric for the Evaluation of Vision-Language Models
Aligning Diffusion Language Models via Unpaired Preference Optimization
TaoSR-AGRL: Adaptive Guided Reinforcement Learning Framework for E-commerce Search Relevance
Global Convergence of Policy Gradient for Entropy Regularized Linear-Quadratic Control with multiplicative noise
SecInfer: Preventing Prompt Injection via Inference-time Scaling
Towards Adapting Federated & Quantum Machine Learning for Network Intrusion Detection: A Survey
Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts
Machine Unlearning for Responsible and Adaptive AI in Education
KoopMotion: Learning Almost Divergence Free Koopman Flow Fields for Motion Planning
Where Should I Study? Biased Language Models Decide! Evaluating Fairness in LMs for Academic Recommendations
Radio Astronomy in the Era of Vision-Language Models: Prompt Sensitivity and Adaptation
SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning
Survey of Vision-Language-Action Models for Embodied Manipulation
Mitigating Hallucinations in Large Language Models via Causal Reasoning
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
M3-Net: A Cost-Effective Graph-Free MLP-Based Model for Traffic Prediction
The Power of Many: Synergistic Unification of Diverse Augmentations for Efficient Adversarial Robustness
Veli: Unsupervised Method and Unified Benchmark for Low-Cost Air Quality Sensor Correction
Asking the Right Questions: Benchmarking Large Language Models in the Development of Clinical Consultation Templates
M^2VAE: Multi-Modal Multi-View Variational Autoencoder for Cold-start Item Recommendation
IndoPref: A Multi-Domain Pairwise Preference Dataset for Indonesian
Bridging Synthetic and Real-World Domains: A Human-in-the-Loop Weakly-Supervised Framework for Industrial Toxic Emission Segmentation
Bayesian preference elicitation for decision support in multiobjective optimization
Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper
Kodezi Chronos: A Debugging-First Language Model for Repository-Scale Code Understanding
Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery
A Personalised Formal Verification Framework for Monitoring Activities of Daily Living of Older Adults Living Independently in Their Homes
Knowledge-Guided Brain Tumor Segmentation via Synchronized Visual-Semantic-Topological Prior Fusion
Geo-Registration of Terrestrial LiDAR Point Clouds with Satellite Images without GNSS
TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving
What Do Latent Action Models Actually Learn?
Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
Flat Channels to Infinity in Neural Loss Landscapes
Edit Flows: Flow Matching with Edit Operations
MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes
Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems
An empirical study of task and feature correlations in the reuse of pre-trained models
anyECG-chat: A Generalist ECG-MLLM for Flexible ECG Input and Multi-Task Understanding
Solver-Free Decision-Focused Learning for Linear Optimization Problems
The Feasibility of Topic-Based Watermarking on Academic Peer Reviews
Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications
FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding
Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity
RefiDiff: Progressive Refinement Diffusion for Efficient Missing Data Imputation
Rainbow Delay Compensation: A Multi-Agent Reinforcement Learning Framework for Mitigating Delayed Observation
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
MOSAIC: A Skill-Centric Algorithmic Framework for Long-Horizon Manipulation Planning
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks
Repetitive Contrastive Learning Enhances Mamba's Selectivity in Time Series Prediction
Generalization Bounds in Hybrid Quantum-Classical Machine Learning Models
Beyond the Hype: Embeddings vs. Prompting for Multiclass Classification Tasks
Large Language Models Are Unreliable for Cyber Threat Intelligence
A Causal Framework to Measure and Mitigate Non-binary Treatment Discrimination
Evolutionary Policy Optimization
What's Producible May Not Be Reachable: Measuring the Steerability of Generative Models
MARS: Multi-Agent Adaptive Reasoning with Socratic Guidance for Automated Prompt Optimization
Prompt-OT: An Optimal Transport Regularization Paradigm for Knowledge Preservation in Vision-Language Model Adaptation
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning
A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning
Contextual Thompson Sampling via Generation of Missing Data
FedP$^2$EFT: Federated Learning to Personalize PEFT for Multilingual LLMs
Privacy-Preserving Retrieval-Augmented Generation with Differential Privacy
Certified Training with Branch-and-Bound for Lyapunov-stable Neural Control
Training and Evaluating Language Models with Template-based Data Generation
The Visual Counter Turing Test (VCT2): A Benchmark for Evaluating AI-Generated Image Detection and the Visual AI Index (VAI)
Conditional Distribution Learning for Graph Classification
Large Language Model Benchmarks in Medical Tasks
LLM4AD: Large Language Models for Autonomous Driving - Concept, Review, Benchmark, Experiments, and Future Trends
TeVAE: A Variational Autoencoder Approach for Discrete Online Anomaly Detection in Variable-state Multivariate Time-series Data
Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships
An Information Theoretic Evaluation Metric For Strong Unlearning
CSAI: Conditional Self-Attention Imputation for Healthcare Time-series
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Hyperdimensional Decoding of Spiking Neural Networks
Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression
DigiData: Training and Evaluating General-Purpose Mobile Control Agents
A Theoretical Analysis of Detecting Large Model-Generated Time Series
Green AI: A systematic review and meta-analysis of its definitions, lifecycle models, hardware and measurement attempts
Do LLMs Feel? Teaching Emotion Recognition with Prompts, Retrieval, and Curriculum Learning
Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis
Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries
e1: Learning Adaptive Control of Reasoning Effort
Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier
Mixed-Density Diffuser: Efficient Planning with Non-Uniform Temporal Resolution
Load more
CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization
Created by
Haebom
저자
Debeshee Das, Luca Beurer-Kellner, Marc Fischer, Maximilian Baader
개요
LLM 에이전트의 툴 접근 및 민감 데이터 접근 증가로 인한 간접 프롬프트 인젝션 공격 표면 확대에 대응하기 위해, 실행 가능한 명령어를 포함하지 않아야 한다는 보안 원칙에 기반한 새로운 접근 방식을 제시합니다.
이 연구는 토큰 레벨의 정화 과정을 통해 툴 출력에서 AI 시스템을 겨냥한 명령어를 제거합니다. 이는 기존의 안전 분류기와 달리 비차단적이며, 보정 필요 없고, 툴 출력의 컨텍스트에 독립적입니다.
또한, 지시 튜닝 데이터만으로 토큰 레벨 예측기를 훈련할 수 있으며,
본 연구는 AgentDojo, BIPIA, InjecAgent, ASB, SEP 등의 벤치마크에서 7~10배의 공격 성공률(ASR) 감소(AgentDojo에서 34%에서 3%로 감소)를 달성하면서, 에이전트 유틸리티를 저해하지 않음을 보였습니다.
시사점, 한계점
•
시사점:
◦
토큰 레벨의 명령어 제거를 통한 안전성 확보.
◦
비차단적이며, 보정 불필요, 컨텍스트 독립적인 접근 방식.
◦
실제 데이터 기반의 훈련 가능성.
◦
다양한 공격 및 벤치마크에서 높은 방어 효과 입증 (ASR 감소).
◦
에이전트 유틸리티 유지.
•
한계점:
◦
구체적인 공격 유형에 대한 세부적인 분석 및 방어 메커니즘에 대한 설명 부족.
◦
훈련 데이터의 품질과 다양성에 대한 영향에 대한 논의 부족.
◦
잠재적인 오탐 및 성능 저하 가능성에 대한 추가적인 연구 필요.
PDF 보기
Made with Slashpage