/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization
Bayesian Optimization of Process Parameters of a Sensor-Based Sorting System using Gaussian Processes as Surrogate Models
Multi-modal Relational Item Representation Learning for Inferring Substitutable and Complementary Items
SourceSplice: Source Selection for Machine Learning Tasks
OneShield - the Next Generation of LLM Guardrails
RecPS: Privacy Risk Scoring for Recommender Systems
HuiduRep: A Robust Self-Supervised Framework for Learning Neural Representations from Extracellular Recordings
Leveraging Synthetic Data for Question Answering with Multilingual LLMs in the Agricultural Domain
Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback
A Segmented Robot Grasping Perception Neural Network for Edge AI
Binarizing Physics-Inspired GNNs for Combinatorial Optimization
Disentangling Neural Disjunctive Normal Form Models
The Second Machine Turn: From Checking Proofs to Creating Concepts
EmissionNet: Air Quality Pollution Forecasting for Agriculture
Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations
Evaluating LLMs on Real-World Forecasting Against Human Superforecasters
Sign Spotting Disambiguation using Large Language Models
RAG-R1: Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
Discovering the underlying analytic structure within Standard Model constants using artificial intelligence
MR-CLIP: Efficient Metadata-Guided Learning of MRI Contrast Representations
Curious Causality-Seeking Agents Learn Meta Causal World
Theoretically Unmasking Inference Attacks Against LDP-Protected Clients in Federated Vision Models
Private GPTs for LLM-driven testing in software development and machine learning
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora
Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Credible Plan-Driven RAG Method for Multi-Hop Question Answering
Debunking with Dialogue? Exploring AI-Generated Counterspeech to Challenge Conspiracy Theories
E2E Parking Dataset: An Open Benchmark for End-to-End Autonomous Parking
Dominated Actions in Imperfect-Information Games
FakeIDet: Exploring Patches for Privacy-Preserving Fake ID Detection
Simultaneous Motion And Noise Estimation with Event Cameras
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Novice Developers' Perspectives on Adopting LLMs for Software Development: A Systematic Literature Review
ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning
A Survey on Post-training of Large Language Models
Do Large Language Models Know How Much They Know?
Better Embeddings with Coupled Adam
Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks
An Investigation into Value Misalignment in LLM-Generated Texts for Cultural Heritage
Embracing Large Language Models in Traffic Flow Forecasting
A Large Sensor Foundation Model Pretrained on Continuous Glucose Monitor Data for Diabetes Management
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Un-mixing Test-time Adaptation under Heterogeneous Data Streams
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics
Cobblestone: Iterative Automation for Formal Verification
Cooperative and Asynchronous Transformer-based Mission Planning for Heterogeneous Teams of Mobile Robots
Policy Maps: Tools for Guiding the Unbounded Space of LLM Behaviors
AttnMod: Attention-Based New Art Styles
Loss Landscape Degeneracy and Stagewise Development in Transformers
Tackling Size Generalization of Graph Neural Networks on Biological Data from a Spectral Perspective
Gradient Leakage Defense with Key-Lock Module for Federated Learning
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
Semantic Chain-of-Trust: Autonomous Trust Orchestration for Collaborator Selection via Hypergraph-Aided Agentic AI
How Far Are AI Scientists from Changing the World?
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence
EARTH: Structuring Creative Evolution through Model Error in Generative AI
On Gradual Semantics for Assumption-Based Argumentation
Sound and Complete Neurosymbolic Reasoning with LLM-Grounded Interpretations
Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team
ORFS-agent: Tool-Using Agents for Chip Design Optimization
World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks
The Urban Impact of AI: Modeling Feedback Loops in Next-Venue Recommendation
BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking
OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problems with Reasoning LLM
Causal Explanations for Image Classifiers
BCR-DRL: Behavior- and Context-aware Reward for Deep Reinforcement Learning in Human-AI Coordination
Federated Cross-Training Learners for Robust Generalization under Data Heterogeneity
Identifying Unique Spatial-Temporal Bayesian Network without Markov Equivalence
Do They Understand Them? An Updated Evaluation on Nonbinary Pronoun Handling in Large Language Models
SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation
Sample-Aware Test-Time Adaptation for Medical Image-to-Image Translation
MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech Detection under Cloaking Perturbations
A Simple and Effective Method for Uncertainty Quantification and OOD Detection
Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking
Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos
Agentic large language models improve retrieval-based radiology question answering
Out-of-Context Abduction: LLMs Make Inferences About Procedural Data Leveraging Declarative Facts in Earlier Training Data
How LLMs are Shaping the Future of Virtual Reality
Adaptive Machine Learning-Driven Multi-Fidelity Stratified Sampling for Failure Analysis of Nonlinear Stochastic Systems
Dynamically Adaptive Reasoning via LLM-Guided MCTS for Efficient and Context-Aware KGQA
Nested Graph Pseudo-Label Refinement for Noisy Label Domain Adaptation Learning
JSON-Bag: A generic game trajectory representation
NyayaRAG: Realistic Legal Judgment Prediction with RAG under the Indian Common Law System
Efficient Solution and Learning of Robust Factored MDPs
D3: Training-Free AI-Generated Video Detection Using Second-Order Features
On-Device Diffusion Transformer Policy for Efficient Robot Manipulation
Segment First, Retrieve Better: Realistic Legal Search via Rhetorical Role-Based Queries
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
Advancing Quantum Information Science Pre-College Education: The Case for Learning Sciences Collaboration
Backdoor Attacks on Deep Learning Face Detection
Similarity-Based Self-Construct Graph Model for Predicting Patient Criticalness Using Graph Neural Networks and EHR Data
Prompting Science Report 3: I'll pay you or I'll kill you -- but will you care?
Composable OS Kernel Architectures for Autonomous Intelligence
LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks
Wukong Framework for Not Safe For Work Detection in Text-to-Image systems
OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery
Load more
Agent Safety Alignment via Reinforcement Learning
Created by
Haebom
作者
Zeyang Sha, Hanling Tian, Zhuoer Xu, Shiwen Cui, Changhua Meng, Weiqiang Wang
概要
この論文は、ツールが利用可能な自律的な大規模言語モデル(LLM)エージェントの登場により、既存の会話の誤用を超える新しい安全上のリスクが発生することについて説明します。外部機能を実行できるこれらのエージェントは、ユーザー主導の脅威(敵対的なプロンプトなど)とツール主導の脅威(破損したツールの悪意のある出力)の両方に脆弱です。この論文では、ツール使用エージェントのための最初の統合安全アライメントフレームワークを提案し、構造化推論とサンドボックス強化学習を通じて両方の脅威チャネルを処理できるようにします。ユーザープロンプトとツールレスポンスの両方に対して、良性、悪性、敏感な3つのモードの分類スキームを導入し、ポリシーベースの意思決定モデルを定義します。このフレームワークは、実際のツールの実行をシミュレートし、きめ細かい補償の形成を可能にするカスタムサンドボックス環境を使用します。 Agent SafetyBench、InjecAgent、BFCLを含むパブリックベンチマークと自己構築ベンチマークの幅広い評価により、安全なアライメントエージェントがセキュリティ脅威に対する耐性を大幅に向上させるとともに、ポジティブタスクの強力な有用性を維持します。結果は安全性と効率性を一緒に最適化できることを示しており、自律的なLLMエージェントの信頼できる展開のための基盤を築きます。
Takeaways、Limitations
•
Takeaways:
◦
ツールの使用LLMエージェントの安全脅威に対する最初の統合安全アライメントフレームワークの提示。
◦
ユーザー主導とツール主導の脅威の両方に対する効果的な対応策の提示
◦
サンドボックス環境を用いた強化学習による安全性と効率性の同時最適化の可能性を実証
◦
自律LLMエージェントの信頼できる展開のための基盤を築きます。
•
Limitations:
◦
提案されたフレームワークの実際の環境を適用するときに発生する可能性のある問題と制限に関する追加の研究が必要です。
◦
さまざまな種類のツールと脅威シナリオの一般化パフォーマンス検証が必要です。
◦
サンドボックス環境の完璧な現実反射の難しさ。
◦
新しいタイプの脅威に対するフレームワークの適応性に関するさらなる研究が必要です。
PDFを見る
Made with Slashpage