Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Dynaword: From One-shot to Continuously Developed Datasets

Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor

Proof2Hybrid: Automatic Mathematical Benchmark Synthesis for Proof-Centric Problems

Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy

BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability

SpectrumWorld: Artificial Intelligence Foundation for Spectroscopy

Managing Escalation in Off-the-Shelf Large Language Models

FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models

A Foundational Schema.org Mapping for a Legal Knowledge Graph: Representing Brazilian Legal Norms as FRBR Works

D3: Training-Free AI-Generated Video Detection Using Second-Order Features

SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity

Vision-Language Fusion for Real-Time Autonomous Driving: Goal-Centered Cross-Attention of Camera, HD-Map, & Waypoints

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation

Memorization in Fine-Tuned Large Language Models

From Entanglement to Alignment: Representation Space Decomposition for Unsupervised Time Series Domain Adaptation

The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?

Post-Completion Learning for Language Models

Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content

Equivariant Volumetric Grasping

SemiSegECG: A Multi-Dataset Benchmark for Semi-Supervised Semantic Segmentation in ECG Delineation

FedSA-GCL: A Semi-Asynchronous Federated Graph Learning Framework with Personalized Aggregation and Cluster-Aware Broadcasting

Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning

P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices

Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark

Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiation

TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models

Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs

$\Texttt{Droid}$: A Resource Suite for AI-Generated Code Detection

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Principled Foundations for Preference Optimization

Evaluating LLMs on Real-World Forecasting Against Expert Forecasters

STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking

S2FGL: Spatial Spectral Federated Graph Learning

AI4Research: A Survey of Artificial Intelligence for Scientific Research

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation

Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints

Causally Steered Diffusion for Automated Video Counterfactual Generation

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

ProRefine: Inference-Time Prompt Refinement with Textual Feedback

SALAD: Systematic Assessment of Machine Unlearning on LLM-Aided Hardware Design

MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering

Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning

LightRetriever: A LLM-based Hybrid Retrieval Architecture with 1000x Faster Query Inference

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind

Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI

All-optical temporal integration mediated by subwavelength heat antennas

GRILL: Gradient Signal Restoration in Ill-Conditioned Layers to Enhance Adversarial Attacks on Autoencoders

JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers

FFCBA: Feature-based Full-target Clean-label Backdoor Attacks

Multilingual Performance Biases of Large Language Models in Education

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis

Efficient Generative Model Training via Embedded Representation Warmup

Graph Attention-Driven Bayesian Deep Unrolling for Dual-Peak Single-Photon Lidar Imaging

Spectral Architecture Search for Neural Network Models

Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

Potential Score Matching: Debiasing Molecular Structure Sampling with Potential Energy Guidance

Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

Augmented Adversarial Trigger Learning

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs

A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness

PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset

DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping

Entropy-Lens: The Information Signature of Transformer Computations

CAMEF: Causal-Augmented Multi-Modality Event-Driven Financial Forecasting by Integrating Time Series Patterns and Salient Macroeconomic Announcements

Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach

AdaMCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Multilingual Chain-of-Thought

AI-driven Wireless Positioning: Fundamentals, Standards, State-of-the-art, and Challenges

CHIRP: A Fine-Grained Benchmark for Open-Ended Response Evaluation in Vision-Language Models

Average-Reward Soft Actor-Critic

Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation

From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

SANDWICH: Towards an Offline, Differentiable, Fully-Trainable Wireless Neural Ray-Tracing Surrogate

IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves

Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification

Effective AGM Belief Contraction: A Journey beyond the Finitary Realm (Technical Report)

Beyond Images: Adaptive Fusion of Visual and Textual Data for Food Classification

TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks

KCR: Resolving Long-Context Knowledge Conflicts via Reasoning in LLMs

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

CADDesigner: Conceptual Design of CAD Models Based on General-Purpose Agent

Mind the Gap: The Divergence Between Human and LLM-Generated Tasks

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power

Tiny-BioMoE: a Lightweight Embedding Model for Biosignal Analysis

The AlphaPhysics Term Rewriting System for Marking Algebraic Expressions in Physics Exams

Modeling Deontic Modal Logic in the s(CASP) Goal-directed Predicate Answer Set Programming System

Automatic Prompt Optimization for Knowledge Graph Construction: Insights from an Empirical Study

The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning

Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments

Enhancing AI System Resiliency: Formulation and Guarantee for LSTM Resilience Based on Control Theory

UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization

Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications

Created by

Haebom

作者

Wenxuan Wang, Zizhan Ma, Meidan Ding, Shiyi Zheng, Shengyuan Liu, Jie Liu, Jiaming Ji, Wenting Chen, Xiang Li, Linlin Shen, Yixuan Yuan

概要

本論文は、医療分野における大規模言語モデル（LLM）の発展とその限界を取り上げた最初の体系的検討論文です。 LLMが医療現場の中核である体系的で透明で検証可能な推論能力においてまだ不足していることを指摘し、単一段階の回答生成から医療推論のために特別に設計されたLLM開発への移行を分析します。トレーニング時の戦略（例：マップ学習の微調整、強化学習）とテスト時のメカニズム（例：プロンプトエンジニアリング、マルチエージェントシステム）に分類された推論強化技術の分類体系を提案し、さまざまなデータモダリティ（テキスト、画像、コード）と診断、教育、治療計画など、主要な臨床応用分野でこれらの技術の適用方法を分析します。また、単純精度測定から推論品質と視覚的解析性の洗練された評価への評価ベンチマークの発展過程を調査します。 2022年から2025年までに発表された60の主要な研究を分析し、信頼性の高いギャップや基本的なマルチモーダル推論の必要性などの重要な課題を特定し、効率的で強力で社会技術的に責任ある医療AIを構築するための将来の方向性を提示します。

Takeaways、Limitations

•

Takeaways：医療分野におけるLLMの推論能力を向上させるためのさまざまな技術とその適用方法の体系的な理解を提供します。評価ベンチマークの発展方向を示し、今後の研究方向を提示します。

•

Limitations：分析対象の研究は、特定の期間（2022-2025）の論文に限定されています。信頼性 - もっともらしいギャップや基本的なマルチモーダル推論の必要性など、重要な課題を提起しますが、具体的な解決策は提示しません。分析された研究の品質と偏向についての考察が不足する可能性があります。

Made with Slashpage