/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Benchmarking LLM Causal Reasoning with Scientifically Validated Relationships
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Mining the Mind: What 100M Beliefs Reveal About Frontier LLM Knowledge
Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation
Emotionally Vulnerable Subtype of Internet Gaming Disorder: Measuring and Exploring the Pathology of Problematic Generative AI Use
Explaining raw data complexity to improve satellite onboard processing
Foundations of LLM Knowledge Materialization: Termination, Reproducibility, Robustness
Incremental Summarization for Customer Support via Progressive Note-Taking and Agent Feedback
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants
Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models
DACP: Domain-Adaptive Continual Pre-Training of Large Language Models for Phone Conversation Summarization
High-Fidelity Synthetic ECG Generation via Mel-Spectrogram Informed Diffusion Training
Provable Speech Attributes Conversion via Latent Independence
Chronological Thinking in Full-Duplex Spoken Dialogue Language Models
Paper2Video: Automatic Video Generation from Scientific Papers
MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models
Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models
Generalized Orders of Magnitude for Scalable, Parallel, High-Dynamic-Range Computation
LogAction: Consistent Cross-system Anomaly Detection through Logs via Active Domain Adaptation
Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models
More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration
Learning to Reason for Hallucination Span Detection
Panorama: Fast-Track Nearest Neighbors
Feature Identification via the Empirical NTK
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning
Real-time Noise Detection and Classification in Single-Channel EEG: A Lightweight Machine Learning Approach for EMG, White Noise, and EOG Artifacts
The Sandbox Configurator: A Framework to Support Technical Assessment in AI Regulatory Sandboxes
CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D
PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness
Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models
MORPH: Shape-agnostic PDE Foundation Models
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues
Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
Hierarchical Reinforcement Learning with Low-Level MPC for Multi-Agent Control
ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification
From Correction to Mastery: Reinforced Distillation of Large Language Model Agents
Reproducible workflow for online AI in digital health
HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking
FireGNN: Neuro-Symbolic Graph Neural Networks with Trainable Fuzzy Rules for Interpretable Medical Image Classification
TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation
A Survey of Reinforcement Learning for Large Reasoning Models
X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates
Barycentric Neural Networks and Length-Weighted Persistent Entropy Loss: A Green Geometric and Topological Framework for Function Approximation
Scaling Performance of Large Language Model Pretraining
Towards Methane Detection Onboard Satellites
AEGIS: Automated Co-Evolutionary Framework for Guarding Prompt Injections Schema
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
Safe-Control: A Safety Patch for Mitigating Unsafe Content in Text-to-Image Generation Models
Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation
Long Chain-of-Thought Reasoning Across Languages
MAHL: Multi-Agent LLM-Guided Hierarchical Chiplet Design with Adaptive Debugging
MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
MInDI-3D: Iterative Deep Learning in 3D for Sparse-view Cone Beam Computed Tomography
MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling
CoCoA: Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy
Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?
Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation
From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Controllable Hybrid Captioner for Improved Long-form Video Understanding
Leveraging Personalized PageRank and Higher-Order Topological Structures for Heterophily Mitigation in Graph Neural Networks
Understanding Teen Overreliance on AI Companion Chatbots Through Self-Reported Reddit Narratives
ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations
Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers
Truth, Trust, and Trouble: Medical AI on the Edge
LLMs on a Budget? Say HOLA
The Role of Model Confidence on Bias Effects in Measured Uncertainties for Vision-Language Models
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks
Not All Clients Are Equal: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients
Rethinking Losses for Diffusion Bridge Samplers
Think With Videos For Agentic Long-Video Understanding
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Intention-Conditioned Flow Occupancy Models
Product of Experts for Visual Generation
Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining
Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study
Tug-of-war between idioms' figurative and literal interpretations in LLMs
MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
GL-PGENet: A Parameterized Generation Framework for Robust Document Image Enhancement
CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties
The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases
Inference-time Alignment in Continuous Space
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty
Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
LLINBO: Trustworthy LLM-in-the-Loop Bayesian Optimization
Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression
FairSHAP: Preprocessing for Fairness Through Attribution-Based Data Augmentation
Hakim: Farsi Text Embedding Model
Understanding In-context Learning of Addition via Activation Subspaces
Evaluating Evaluation Metrics - The Mirage of Hallucination Detection
T-VEC: A Telecom-Specific Vectorization Model with Enhanced Semantic Understanding via Deep Triplet Loss Fine-Tuning
Hallucination Detection in LLMs with Topological Divergence on Attention Graphs
Load more
Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression
Created by
Haebom
作者
Jingyu Peng, Maolin Wang, Nan Wang, Jiatong Li, Yuchen Li, Yuyang Ye, Wanyu Wang, Pengyue Jia, Kai Zhang, Xiangyu Zhao
概要
大規模言語モデル(LLM)を人間の価値に合わせて調整することにかなりの進歩があったにもかかわらず、現在の安全メカニズムはジャイルブレイク攻撃に対して脆弱です。本論文は、この脆弱性が、アライメント指向のプロンプトと悪意のあるプロンプトとの間の分布的な不一致に起因すると仮定する。これを調べるために、論文では、論理表現変換を活用してLLM安全システムを迂回する新しい、普遍的なブラックボックスジャイルブレイク法であるLogiBreakを紹介する。 LogiBreakは、有害な自然言語プロンプトを形式論理表現に変換し、ソートデータと論理ベースの入力との間の分布的ギャップを悪用し、基本的な意味的意図と読みやすさを維持しながら安全上の制約を回避します。 3つの言語をカバーする多言語jailbreakデータセットでLogiBreakを評価し、さまざまな評価設定と言語コンテキストでその効果を実証します。
Takeaways、Limitations
•
Takeaways:
◦
LogiBreakはLLMの安全システムを迂回する新しいjailbreak方法論を提案する。
◦
論理表現変換によるジャイルブレイクは、言語的障壁を克服し、さまざまな言語環境で効果的です。
◦
LLMソートデータと悪意のある入力との間の分布の違いを悪用して、安全システムの脆弱性を明らかにします。
•
Limitations:
◦
本論文の具体的なLimitationsは要約には記載されていない。 (例:特定のモデルのパフォーマンス、ジャイルブレークの成功率など)
◦
LogiBreakの実際の環境適用可能性と防御戦略の詳細な分析が必要です。
◦
論文全体の内容を確認しなければ、より詳細なLimitationsを把握することができる。
PDFを見る
Made with Slashpage