/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Benchmarking LLM Causal Reasoning with Scientifically Validated Relationships
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Mining the Mind: What 100M Beliefs Reveal About Frontier LLM Knowledge
Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation
Emotionally Vulnerable Subtype of Internet Gaming Disorder: Measuring and Exploring the Pathology of Problematic Generative AI Use
Explaining raw data complexity to improve satellite onboard processing
Foundations of LLM Knowledge Materialization: Termination, Reproducibility, Robustness
Incremental Summarization for Customer Support via Progressive Note-Taking and Agent Feedback
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants
Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models
DACP: Domain-Adaptive Continual Pre-Training of Large Language Models for Phone Conversation Summarization
High-Fidelity Synthetic ECG Generation via Mel-Spectrogram Informed Diffusion Training
Provable Speech Attributes Conversion via Latent Independence
Chronological Thinking in Full-Duplex Spoken Dialogue Language Models
Paper2Video: Automatic Video Generation from Scientific Papers
MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models
Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models
Generalized Orders of Magnitude for Scalable, Parallel, High-Dynamic-Range Computation
LogAction: Consistent Cross-system Anomaly Detection through Logs via Active Domain Adaptation
Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models
More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration
Learning to Reason for Hallucination Span Detection
Panorama: Fast-Track Nearest Neighbors
Feature Identification via the Empirical NTK
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning
Real-time Noise Detection and Classification in Single-Channel EEG: A Lightweight Machine Learning Approach for EMG, White Noise, and EOG Artifacts
The Sandbox Configurator: A Framework to Support Technical Assessment in AI Regulatory Sandboxes
CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D
PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness
Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models
MORPH: Shape-agnostic PDE Foundation Models
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues
Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
Hierarchical Reinforcement Learning with Low-Level MPC for Multi-Agent Control
ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification
From Correction to Mastery: Reinforced Distillation of Large Language Model Agents
Reproducible workflow for online AI in digital health
HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking
FireGNN: Neuro-Symbolic Graph Neural Networks with Trainable Fuzzy Rules for Interpretable Medical Image Classification
TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation
A Survey of Reinforcement Learning for Large Reasoning Models
X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates
Barycentric Neural Networks and Length-Weighted Persistent Entropy Loss: A Green Geometric and Topological Framework for Function Approximation
Scaling Performance of Large Language Model Pretraining
Towards Methane Detection Onboard Satellites
AEGIS: Automated Co-Evolutionary Framework for Guarding Prompt Injections Schema
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
Safe-Control: A Safety Patch for Mitigating Unsafe Content in Text-to-Image Generation Models
Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation
Long Chain-of-Thought Reasoning Across Languages
MAHL: Multi-Agent LLM-Guided Hierarchical Chiplet Design with Adaptive Debugging
MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
MInDI-3D: Iterative Deep Learning in 3D for Sparse-view Cone Beam Computed Tomography
MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling
CoCoA: Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy
Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?
Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation
From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Controllable Hybrid Captioner for Improved Long-form Video Understanding
Leveraging Personalized PageRank and Higher-Order Topological Structures for Heterophily Mitigation in Graph Neural Networks
Understanding Teen Overreliance on AI Companion Chatbots Through Self-Reported Reddit Narratives
ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations
Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers
Truth, Trust, and Trouble: Medical AI on the Edge
LLMs on a Budget? Say HOLA
The Role of Model Confidence on Bias Effects in Measured Uncertainties for Vision-Language Models
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks
Not All Clients Are Equal: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients
Rethinking Losses for Diffusion Bridge Samplers
Think With Videos For Agentic Long-Video Understanding
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Intention-Conditioned Flow Occupancy Models
Product of Experts for Visual Generation
Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining
Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study
Tug-of-war between idioms' figurative and literal interpretations in LLMs
MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
GL-PGENet: A Parameterized Generation Framework for Robust Document Image Enhancement
CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties
The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases
Inference-time Alignment in Continuous Space
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty
Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
LLINBO: Trustworthy LLM-in-the-Loop Bayesian Optimization
Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression
FairSHAP: Preprocessing for Fairness Through Attribution-Based Data Augmentation
Hakim: Farsi Text Embedding Model
Understanding In-context Learning of Addition via Activation Subspaces
Evaluating Evaluation Metrics - The Mirage of Hallucination Detection
T-VEC: A Telecom-Specific Vectorization Model with Enhanced Semantic Understanding via Deep Triplet Loss Fine-Tuning
Hallucination Detection in LLMs with Topological Divergence on Attention Graphs
Load more
Hallucination Detection in LLMs with Topological Divergence on Attention Graphs
Created by
Haebom
作者
Alexandra Bazarova, Aleksandr Yugay, Andrey Shulga, Alina Ermilova, Andrei Volodichev, Konstantin Polev, Julia Belikova, Rauf Parchiev, Dmitry Simakov, Maxim Savchenko, Andrey Savchenko, Serguei Barannikov, Alexey Zaytsev
概要
大規模言語モデル(LLM)における事実誤差発生問題であるサイケデリック現象を解決するために,RAG環境における注意行列に導かれたグラフの位相学的発散を測定するTOPO(TOpology-based HAllucination detector)を提案する。プロンプトと応答サブグラフ間の位相学的発散を分析し、特定の注意ヘッドでより高い発散値が幻覚出力と相関関係があることを発見しました。質問の回答や要約作業を含む幅広い実験を通じて、TOHAは最小限の注釈データと計算リソースを使用して複数のベンチマークで最先端または競争力のある結果を達成しました。注意行列の位相構造解析はLLMの事実的信頼性を効率的かつ強力に表現できることを示唆した。
Takeaways、Limitations
•
Takeaways:
◦
注意行列の位相構造を分析することで、LLMの幻覚現象を効果的に検出できます。
◦
最小限の注釈データと計算リソースを使用しながら、優れたパフォーマンスを示します。
◦
質問の回答と要約作業で、State-of-the-artまたは競争力のある結果を達成しました。
•
Limitations:
◦
本論文の具体的なLimitationsは論文に記載されていない。
PDFを見る
Made with Slashpage