/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning
Crafting Hanzi as Narrative Bridges: An AI Co-Creation Workshop for Elderly Migrants
Distributional Soft Actor-Critic with Diffusion Policy
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
Fast AI Model Splitting over Edge Networks
From Sentences to Sequences: Rethinking Languages in Biological System
MTCNet: Motion and Topology Consistency Guided Learning for Mitral Valve Segmentationin 4D Ultrasound
Horus: A Protocol for Trustless Delegation Under Uncertainty
Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies
Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop
Red Teaming for Generative AI, Report on a Copyright-Focused Exercise Completed in an Academic Medical Center
AirV2X: Unified Air-Ground Vehicle-to-Everything Collaboration
Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability
Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach
Distinguishing Predictive and Generative AI in Regulation
AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation
Text-Aware Image Restoration with Diffusion Models
How Good LLM-Generated Password Policies Are?
Towards an Explainable Comparison and Alignment of Feature Embeddings
Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification
Empowering Intelligent Low-altitude Economy with Large AI Model Deployment
Incorporating LLMs for Large-Scale Urban Complex Mobility Simulation
Generating Hypotheses of Dynamic Causal Graphs in Neuroscience: Leveraging Generative Factor Models of Observed Time Series
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
Threat Modeling for AI: The Case for an Asset-Centric Approach
SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings
PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification
Significativity Indices for Agreement Values
Transferrable Surrogates in Expressive Neural Architecture Search Spaces
Privacy-Preserving Operating Room Workflow Analysis using Digital Twins
Uncertainty-Guided Coarse-to-Fine Tumor Segmentation with Anatomy-Aware Post-Processing
CMD-HAR: Cross-Modal Disentanglement for Wearable Human Activity Recognition
Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models
Understanding-informed Bias Mitigation for Fair CMR Segmentation
HAPI: A Model for Learning Robot Facial Expressions from Human Preferences
MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel
Illuminant and light direction estimation using Wasserstein distance method
Fundamental Limits of Hierarchical Secure Aggregation with Cyclic User Association
LLM-Powered Prediction of Hyperglycemia and Discovery of Behavioral Treatment Pathways from Wearables and Diet
Interleaved Gibbs Diffusion: Generating Discrete-Continuous Data with Implicit Constraints
EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks
Circuit-tuning: A Mechanistic Approach for Identifying Parameter Redundancy and Fine-tuning Neural Networks
EigenLoRAx: Recycling Adapters to Find Principal Subspaces for Resource-Efficient Adaptation and Inference
Learning Traffic Anomalies from Generative Models on Real-Time Observations
Enabling Population-Level Parallelism in Tree-Based Genetic Programming for Comprehensive GPU Acceleration
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Quantifying the Importance of Data Alignment in Downstream Model Performance
Quantum-enhanced causal discovery for a small number of samples
On Characterizations for Language Generation: Interplay of Hallucinations, Breadth, and Stability
Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings from LLMs
COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework
GeMID: Generalizable Models for IoT Device Identification
Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation
Is Complex Query Answering Really Complex?
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning
Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling
Reconsidering the energy efficiency of spiking neural networks
Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes
Sequence-aware Pre-training for Echocardiography Probe Movement Guidance
Anatomical Foundation Models for Brain MRIs
Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective
Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness
Delving into LLM-assisted writing in biomedical publications through excess vocabulary
Towards a Novel Measure of User Trust in XAI Systems
Avoiding Catastrophe in Online Learning by Asking for Help
Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning
Beyond Scale: The Diversity Coefficient as a Data Quality Metric for Variability in Natural Language Data
Kernel Density Bayesian Inverse Reinforcement Learning
Embodied AI Agents: Modeling the World
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
AI Flow: Perspectives, Scenarios, and Approaches
A framework for Conditional Reasoning in Answer Set Programming
Autoformalization in the Era of Large Language Models: A Survey
Agentic AI Process Observability: Discovering Behavioral Variability
Artificial Intelligence Index Report 2025
MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science
XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation
Direct Preference Optimization Using Sparse Feature-Level Constraints
Unsupervised Cognition
Urban Region Pre-training and Prompting: A Graph-based Approach
Road Graph Generator: Mapping roads at construction sites from GPS データ
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans
Answer Matching Outperforms Multiple Choice for Language Model Evaluation
Subtyping in DHOL - Extended preprint
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
USAD: An Unsupervised Data Augmentation Spatio-Temporal Attention Diffusion Network
DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift
SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs
Multi-agent Auditory Scene Analysis
Fast and Simplex: 2-Simplicial Attention in Triton
Synthesizable by Design: A Retrosynthesis-Guided Framework for Molecular Analog Generation
Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics
Early Signs of Steganographic Capabilities in Frontier LLMs
Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks
FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models
APT: Adaptive Personalized Training for Diffusion Models with Limited Data
ASDA: Audio Spectrogram Differential Attention Mechanism for Self-Supervised Representation Learning
Load more
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning
Created by
Haebom
作者
Peng Xia, Jinglu Wang, Yibo Peng, Kaide Zeng, Xian Wu, Xiangru Tang, Hongtu Zhu, Yun Li, Shujie Liu, Yan Lu, Huaxiu Yao
概要
本論文は、様々な医療専門分野にわたって一般化するのが困難な既存の単一エージェント医療大規模ビジュアル言語モデル(Med-LVLM)の限界を克服するために、強化学習(RL)ベースのマルチエージェントフレームワークであるMMedAgent-RLを提案します。 MMedAgent-RLは、患者を適切な専門分野に割り当てる分類医と、複数の専門家の判断と独自の知識を統合して最終決定を下す主治医の2つのQwen2.5-VLベースのGPエージェントで構成されています。専門家の出力の不一致の問題を解決するために、医師に専門家の模倣と間違いの修正とのバランスを徐々に学習させるカリキュラム学習(CL)ベースのRL戦略を導入しました。 5つの医療VQAベンチマーク実験の結果、MMedAgent-RLはオープンソースと独自のMed-LVLMを上回り、人と同様の推論パターンを示しています。特に、地図学習ベースの微調整基準モデルと比較して、平均20.7%の性能向上を達成しました。
Takeaways、Limitations
•
Takeaways:
◦
既存の単一エージェントMed-LVLMの限界を克服する強化学習ベースのマルチエージェントコラボレーションフレームワークの提示
◦
動的かつ最適化されたマルチエキスパートコラボレーションによる医療画像解析と診断性能の向上
◦
カリキュラム学習による専門家の意見の不一致問題解決と人間レベルの推論パターンの実施
◦
従来モデルに比べ有意な性能向上(平均20.7%)達成
•
Limitations:
◦
提案モデルの一般化性能の追加検証が必要
◦
様々な医療データセットの実験結果の提示不足
◦
実際の臨床環境を適用するための追加の研究が必要
◦
Qwen2.5-VLモデルへの依存性による他の言語モデルの適用の難しさまたは制約の存在の可能性
PDFを見る
Made with Slashpage