[공지사항]을 빙자한 안부와 근황
Show more
/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks
Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants
A Roadmap for Climate-Relevant Robotics Research
Fairness Is Not Enough: Auditing Competence and Intersectional Bias in AI-powered Resume Screening
MMOne: Representing Multiple Modalities in One Scene
SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks
CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance
(Almost) Free Modality Stitching of Foundation Models
A Brain Tumor Segmentation Method Based on CLIP and 3D U-Net with Cross-Modal Semantic Guidance and Multi-Level Feature Fusion
KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection
THOR: Transformer Heuristics for On-Demand Retrieval
SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems
KeyRe-ID: Keypoint-Guided Person Re-Identification using Part-Aware Representation in Videos
Prompt Perturbations Reveal Human-Like Biases in LLM Survey Responses
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model
Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling
VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents
ReCode: Updating Code API Knowledge with Reinforcement Learning
Cross-Layer Discrete Concept Discovery for Interpreting Language Models
Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
Multiple-Frequencies Population-Based Training
Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows
ContextQFormer: A New Context Modeling Method for Multi-Turn Multi-Modal Conversations
GPU Performance Portability needs Autotuning
Generating Synthetic Data via Augmentations for Improved Facial Resemblance in DreamBooth and InstantID
Coral Protocol: Open Infrastructure Connecting The Internet of Agents
MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness
Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
KP Quantum Neural Networks
VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models
Data-Efficient Deep Operator Network for Unsteady Flow: A Multi-Fidelity Approach with Physics-Guided Subsampling
Learning Universal Human Mobility Patterns with a Foundation Model for Cross-domain Data Fusion
GeoFlow-SLAM: A Robust Tightly-Coupled RGBD-Inertial and Legged Odometry Fusion SLAM for Dynamic Legged Robotics
A Multi-Stage Framework with Taxonomy-Guided Reasoning for Occupation Classification Using Large Language Models
Multi-View Node Pruning for Accurate Graph Representation
V-Max: A Reinforcement Learning Framework for Autonomous Driving
Interpretable Transformation and Analysis of Timelines through Learning via Surprisability
AI Governance InternationaL Evaluation Index (AGILE Index) 2024
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
Improving Transformer World Models for Data-Efficient RL
LLM-RecG: A Semantic Bias-Aware Framework for Zero-Shot Sequential Recommendation
SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks
Determination of galaxy photometric redshifts using Conditional Generative Adversarial Networks (CGANs)
Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
MRGen: Segmentation Data Engine for Underrepresented MRI Modalities
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy for Visuomotor Imitation Learning
Dataset resulting from the user study on comprehensibility of explainable AI algorithms
Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information
DeFine: Decision-Making with Analogical Reasoning over Factor Profiles
Benchmarking Sub-Genre Classification For Mainstage Dance Music
Risks of ignoring uncertainty propagation in AI-augmented security pipelines
MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs
Leveraging Quantum Superposition to Infer the Dynamic Behavior of a Spatial-Temporal Neural Network Signaling Model
Bounding the Worst-class Error: A Boosting Approach
TBDetector:Transformer-Based Detector for Advanced Persistent Threats with Provenance Graph
Machine Learning Systems: A Survey from a Data-Oriented Perspective
Aime: Towards Fully-Autonomous Multi-Agent Framework
SmartThinker: Learning to Compress and Preserve Reasoning by Step-Level Length Control
Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments
NTRL: Encounter Generation via Reinforcement Learning for Dynamic Difficulty Adjustment in Dungeons and Dragons
Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplifications and Resistance in Multi-Agent Based LLM-as-Judge
ActionStudio: A Lightweight Framework for Data and Training of Large Action Models
BEARCUBS: A benchmark for computer-using web agents
Demystifying MuZero Planning: Interpreting the Learned Model
LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Imbalance in Balance: Online Concept Balancing in Generation Models
Latent Policy Steering with Embodiment-Agnostic Pretrained World Models
Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research
Towards Formal Verification of LLM-Generated Code from Natural Language Prompts
Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour
Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
Voxtral
Merge Kernel for Bayesian Optimization on Permutation Space
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy
Automating Steering for Safe Multimodal Large Language Models
HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models
VITA: Vision-to-Action Flow Matching Policy
$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation
Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection
Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback
SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks
Prompt Injection 2.0: Hybrid AI Threats
Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
Load more
MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs
Created by
Haebom
作者
Irene Siragusa, Salvatore Contino, Massimo La Ciura, Rosario Alicata, Roberto Pirrone
概要
本論文は、医療分野における人工知能アプリケーションの開発への関心の増加にもかかわらず、プライバシーの問題による高品質のデータセットの欠如という困難を扱います。ビジョン言語モデル(VLM)の発展により、医療スキャンに対応する臨床報告書と所見が添付されたマルチモーダル医療データセットの必要性が高まっています。本論文では、医師、看護師、医療学生の継続的な医学教育の目的で主に使用されるマルチモーダルデータセットMedPix®に基づいて、MedPix 2.0データセットを構築するための全体的なワークフローを紹介します。視覚データとテキストデータを抽出する半自動パイプラインとノイズサンプルを削除する手動治療手順を経て、MongoDBデータベースを作成します。 MongoDBインスタンスをデータセットとともに効率的にナビゲートし、VLMのトレーニングおよび/または微調整に簡単に使用できる生データを取得できるグラフィカルユーザーインターフェイス(GUI)を開発しました。 MedPix 2.0を使用してトレーニングされた検索拡張生成ベースのVLMモデルであるDR-Minervaを紹介し、Llama 3.1 Instruct 8Bを使用した知識グラフを活用してDR-Minervaを拡張したモデルを提案します。結果のアーキテクチャは、医療意思決定支援システムで、エンドツーエンドの方法で照会できます。 MedPix 2.0はGitHubで利用可能です。
Takeaways、Limitations
•
Takeaways:
◦
医療分野におけるVLM開発に不可欠な高品質マルチモーダル医療データセットMedPix 2.0を提供します。
◦
MedPix 2.0データセットの効率的なナビゲーションと活用を可能にするGUIを提供します。
◦
MedPix 2.0ベースの医療意思決定支援システムとして利用可能なDR-Minervaモデルとその拡張モデルを提示します。
◦
GitHubを介してデータセットのアクセシビリティを高めました。
•
Limitations:
◦
データセットのサイズと多様性に関する具体的な情報不足。
◦
データ収集および処理中に発生する可能性があるバイアスの分析の欠如。
◦
DR-Minervaモデルの性能評価の詳細はない。
◦
個人情報保護の問題に対する具体的な解決策 未提示(データセット構築過程で個人情報保護をどのように考慮したか明示的に言及されない)。
PDFを見る
Made with Slashpage