/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Power Stabilization for AI Training Datacenters
A Systematic Study of Deep Learning Models and xAI Methods for Region-of-Interest Detection in MRI Scans
Documenting Deployment with Fabric: A Repository of Real-World AI Governance
Surya: Foundation Model for Heliophysics
Hard Examples Are All You Need: Maximizing GRPO Post-Training Under Annotation Budgets
MCLPD:Multi-view Contrastive Learning for EEG-based PD Detection Across Datasets
FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering
VerilogLAVD: LLM-Aided Rule Generation for Vulnerability Detection in Verilog
Kourkoutas-Beta: A Sunspike-Driven Adam Optimizer with Desert Flair
SecFSM: Knowledge Graph-Guided Verilog Code Generation for Secure Finite State Machines in Systems-on-Chip
Fortifying the Agentic Web: A Unified Zero-Trust Architecture Against Logic-layer Threats
LATTE: Learning Aligned Transactions and Textual Embeddings for Bank Clients
Preacher: Paper-to-Video Agentic System
Agoran: An Agentic Open Marketplace for 6G RAN Automation
Architectural Co-Design for Zero-Shot Anomaly Detection: Decoupling Representation and Dynamically Fusing Features in CLIP
IBPS: Indian Bail Prediction System
Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time
TS-Insight: Visualizing Thompson Sampling for Verification and XAI
When Better Eyes Lead to Blindness: A Diagnostic Study of the Information Bottleneck in CNN-LSTM Image Captioning Models
Seed-X: Building Strong Multilingual Translation LLM with 7B パラメータ
Generation of structure-guided pMHC-I libraries using Diffusion Models
Cross-Modality Masked Learning for Survival Prediction in ICI Treated NSCLC Patients
MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation
KEA Explain: Explanations of Hallucinations using Graph Kernel Analysis
Empirical Evidence for Alignment Faking in a Small LLM and Prompt-Based Mitigation Techniques
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Deep regularization networks for inverse problems with noisy operators
LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles
On the Fundamental Impossibility of Hallucination Control in Large Language Models
Lossless Token Sequence Compression via Meta-Tokens
Versatile Cardiovascular Signal Generation with a Unified Diffusion Transformer
Flexible Tool Selection through Low-dimensional Attribute Alignment of Vision and Language
Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model
MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey
Sadeed: Advancing Arabic Diacritization Through Small Language Model
Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs
CaRL: Learning Scalable Planning Policies with Simple Rewards
On the Consistency of GNN Explanations for Malware Detection
Cequel: Cost-Effective Querying of Large Language Models for Text Clustering
Kuwain 1.5B: An Arabic SLM via Language Injection
MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos
TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting
VerifiAgent: a Unified Verification Agent in Language Model Reasoning
Embodied Long Horizon Manipulation with Closed-loop Code Generation and Incremental Few-shot Adaptation
Revisiting Out-of-Distribution Detection in Real-time Object Detection: From Benchmark Pitfalls to a New Mitigation Paradigm
A Case for Specialisation in Non-Human Entities
Pragmatic Inference Chain (PIC) Improving LLMs' Reasoning of Authentic Implicit Toxic Language
Synthetic vs. Gold: The Role of LLM Generated Labels and Data in Cyberbullying Detection
Innamark: A Whitespace Replacement Information-Hiding Method
Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering
RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation
Setup Once, Secure Always: A Single-Setup Secure Federated Learning Aggregation Protocol with Forward and Backward Secrecy for Dynamic Users
Self-Supervised Prompt Optimization
Learning to Generate Unit Tests for Automated Debugging
Modeling Discrimination with Causal Abstraction
Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
Knowledge-Guided Prompt Learning for Request Quality Assurance in Public Code Review
Fine-tuning foundational models to code diagnoses from veterinary health records
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Continual Learning for Multimodal Data Fusion of a Soft Gripper
BoostTrack++: using tracklet information to detect more objects in multiple object tracking
OPDR: Order-Preserving Dimension Reduction for Semantic Embedding of Multimodal Scientific Data
CREMA: A Contrastive Regularized Masked Autoencoder for Robust ECG Diagnostics across Clinical Domains
Generating 3D Terrain with 2D Cellular Automata
Unplug and Play Language Models: Decomposing Experts in Language Models at Inference Time
Using a cognitive architecture to consider antiBlackness in design and development of AI systems
ITL-LIME: Instance-Based Transfer Learning for Enhancing Local Explanations in Low-Resource Data Settings
ThinkTuning: Instilling Cognitive Reflections without Distillation
A "good regulator theorem" for embodied agents
Prescriptive Agents based on RAG for Automated Maintenance (PARAM)
One Subgoal at a Time: Zero-Shot Generalization to Arbitrary Linear Temporal Logic Requirements in Multi-Task Reinforcement Learning
Opus: A Prompt Intention Framework for Complex Workflow Generation
Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues
It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics
GATES: Cost-aware Dynamic Workflow Scheduling via Graph Attention Networks and Evolution Strategy
Automatic Curriculum Design for Zero-Shot Human-AI Coordination
PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data
SycEval: Evaluating LLM Sycophancy
CopyrightShield: Enhancing Diffusion Model Security against Copyright Infringement Attacks
VLASCD: A Visual Language Action Model for Simultaneous Chatting and Decision Making
Exploring the Effect of Explanation Content and Format on User Comprehension and Trust in Healthcare
On Learning Action Costs from Input Plans
Human-Object Interaction from Human-Level Instructions
Non-linear Welfare-Aware Strategic Learning
CRISPR-GPT for Agentic Automation of Gene-editing Experiments
SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
Neural Robot Dynamics
Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis
"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries
End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
Numerical models outperform AI weather forecasts of record-breaking extremes
EcomMMMU: Strategic Utilization of Visuals for Robust Multimodal E-Commerce Models
Tutorial on the Probabilistic Unification of Estimation Theory, Machine Learning, and Generative AI
StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding
Load more
End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning
Created by
Haebom
作者
Qiaoyu Zheng, Yuze Sun, Chaoyi Wu, Weike Zhao, Pengcheng Qiu, Yongguo Yu, Kun Sun, Yanfeng Wang, Ya Zhang, Weidi Xie
概要
医療大規模言語モデルの正確な診断は、知識のギャップと幻覚によって妨げられる。検索とツールの強化方法が役に立ちますが、外部知識の弱い活用とフィードバック推論のトレーサビリティが低下し、その影響が制限されます。これらの問題を解決するために、この研究は強化学習(RL)を介してエンドツーエンドで学習されたエージェントRAGシステムであるDeep-DxSearchを提供します。このシステムは、追跡可能な検索増強推論を医療診断に適用する。 Deep-DxSearchは、患者の記録と信頼できる医療知識ソースを含む大規模な医療検索コーパスを構成し、診断シナリオ全体で検索認識の推論をサポートします。 LLM をコアエージェントとし、検索コーパスを環境に整理し、フォーマット、検索、推論構造、診断精度に対するカスタマイズされた補償を使用して、大規模なデータを通じて RL にエージェント RAG ポリシーを進めることが重要です。実験は、エンドツーエンドエージェントRLトレーニングフレームワークが複数のデータセンターでプロンプトエンジニアリングとトレーニングなしRAGアプローチを一貫して超えていることを示しています。トレーニング後、Deep-DxSearchはGPT-4o、DeepSeek-R1などの特定の医療フレームワークなどの強力な診断基準を上回り、分布内および分布外の設定で一般的な疾患とまれな疾患診断の両方で診断精度が大幅に向上しました。さらに、補償設計と検索コーパス構成要素のアブレーション研究は、伝統的な実装と比較してアプローチの一意性と効果を強調する重要な役割を確認しました。最後に、ケーススタディと解釈可能性分析は、Deep-DxSearchの診断方針の改善を強調し、パフォーマンスの向上に関する詳細な洞察を提供し、臨床医がより信頼性が高く正確な予備診断を提供するのに役立ちます。
Takeaways、Limitations
•
Takeaways:
◦
エンドツーエンド強化学習ベースのエージェントRAGシステムにより、医療診断の精度が大幅に向上しました。
◦
GPT-4o、DeepSeek-R1など、既存の最先端モデルを凌駕する性能を見せました。
◦
分布内と分布外の両方の設定で優れた性能を示し、一般的な疾患およびまれな疾患の診断に有効であることが証明された。
◦
報酬設計と検索コーパスの重要性を確認し、今後の研究方向を提示した。
◦
ケーススタディと解釈可能性分析により、モデルの意思決定プロセスを理解するのに役立ちます。
•
Limitations:
◦
現在公開されている情報だけでは、Deep-DxSearchのトレーニングデータサイズ、トレーニング時間、計算リソース消費量などの具体的な情報が不足しています。
◦
実際の臨床環境での性能評価と検証がさらに必要である。
◦
モデルの幻覚問題の解決策と追加の改善の余地が存在する可能性があります。
◦
大規模な医療データのアクセシビリティとプライバシー問題の考慮が必要です。
PDFを見る
Made with Slashpage