/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
CTA: Cross-Task Alignment for Better Test Time Training
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning
What's Making That Sound Right Now? Video-centric Audio-Visual Localization
LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization
Domain Generalizable Portrait Style Transfer
StreamDiT: Real-Time Streaming Text-to-Video Generation
From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
Neural-Network solver of ideal MHD equilibria
RAG-R1: Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism
Evaluating AI Counseling in Japanese: Counselor, Client, and Evaluator Roles Assessed by Motivational Interviewing Criteria
Hita: Holistic Tokenizer for Autoregressive Image Generation
Empirical Analysis Of Heuristic and Approximation Algorithms for the The Mutual-Visibility Problem
Horus: A Protocol for Trustless Delegation Under Uncertainty
Geological Everything Model 3D: A Promptable Foundation Model for Unified and Zero-hot Subsurface Understanding
SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures
WATS: Calibrating Graph Neural Networks with Wavelet-Aware Temperature Scaling
IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes
Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager
Enhancing Generalization of Spiking Neural Networks Through Temporal Regularization
Instruction Following by Boosting Attention of Large Language Models
Evaluating Logit-Based GOP Scores for Mispronunciation Detection
LLMs on support of privacy and security of mobile apps: state of the art and research directions
On the Fundamental Impossibility of Hallucination Control in Large Language Models
Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting
CuVSLAM: CUDA accelerated visual odometry and mapping
Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge
An empirical study of task and feature correlations in the reuse of pre-trained models
EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG
Hume: Introducing System-2 Thinking in Visual-Language-Action Model
Towards General Continuous Memory for Vision-Language Models
Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)
Bayesian Hierarchical Invariant Prediction
Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps
Enhancing Satellite Object Localization with Dilated Convolutions and Attention-aided Spatial Pooling
Overcoming Data Scarcity in Generative Language Modelling for Low-Resource Languages: A Systematic Review
The GenAI Generation: Student Views of Awareness, Preparedness, and Concern
Variational OOD State Correction for Offline Reinforcement Learning
Heat Diffusion Models - Interpixel Attention Mechanism
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
Offline Learning and Forgetting for Reasoning with Large Language Models
Redefining Evaluation Standards: A Unified Framework for Evaluating the Korean Capabilities of Language Models
PVChat: Personalized Video Chat with One-Shot Learning
Challenges and Trends in Egocentric Vision: A Survey
Eyes on the Environment: AI-Driven Analysis for Fire and Smoke Classification, Segmentation, and Detection
Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model
A Survey on Transformer Context Extension: Approaches and Evaluation
Ethical AI for Young Digital Citizens: A Call to Action on Privacy Governance
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer
The Algorithmic State Architecture (ASA): An Integrated Framework for AI-Enabled Government
A Cascading Cooperative Multi-agent Framework for On-ramp Merging Control Integrating Large Language Models
Zero-shot Medical Event Prediction Using a Generative Pre-trained Transformer on Electronic Health Records
GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification
Fundamental Limits of Hierarchical Secure Aggregation with Cyclic User Association
Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling
RSPO: Regularized Self-Play Alignment of Large Language Models
Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
Efficient Risk-sensitive Planning via Entropic Risk Measures
Bayesian Optimization for Controlled Image Editing via LLMs
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Composable Strategy Framework with Integrated Video-Text based Large Language Models for Heart Failure Assessment
Safe Beyond the Horizon: Efficient Sampling-based MPC with Neural Control Barrier Functions
A Theory for Conditional Generative Modeling on Multiple Data Sources
Unsupervised Anomaly Detection through Mass Repulsing Optimal Transport
Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics
DeepCell: Self-Supervised Multiview Fusion for Circuit Representation Learning
VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play
ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding
Holistic Construction Automation with Modular Robots: From High-Level Task Specification to Execution
Aria-UI: Visual Grounding for GUI Instructions
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
Pre-Training Graph Contrastive Masked Autoencoders are Strong Distillers for EEG
Random Walks with Tweedie: A Unified View of Score-Based Diffusion Models
Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning
Advancing Stroke Risk Prediction Using a Multi-modal Foundation Model
An AI Theory of Mind Will Enhance Our Collective Intelligence
Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle
Longitudinal Ensemble Integration for sequential classification with multimodal data
Improving Trust Estimation in Human-Robot Collaboration Using Beta Reputation at Fine-grained Timescales
Feint and Attack: Attention-Based Strategies for Jailbreaking and Protecting LLMs
The Nexus of AR/VR, AI, UI/UX, and Robotics Technologies in Enhancing Learning and Social Interaction for Children with Autism Spectrum Disorders: A Systematic Review
What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning
Liability and Insurance for Catastrophic Losses: the Nuclear Power Precedent and Lessons for AI
Insuring Uninsurable Risks from AI: The State as Insurer of Last Resort
Empirical evidence of Large Language Model's influence on human spoken communication
The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control
Curvature-Aligned Federated Learning (CAFe): Harmonizing Loss Landscapes for Fairness Without Demographics
CoDy: Counterfactual Explainers for Dynamic Graphs
Optimal Transport for Domain Adaptation through Gaussian Mixture Models
Learning Federated Neural Graph Databases for Answering Complex Queries from Distributed Knowledge Graphs
Detecting value-expressive text posts in Russian social media
Deep neural networks have an inbuilt Occam's razor
TT-TFHE: a Torus Fully Homomorphic Encryption-Friendly Neural Network Architecture
SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam?
MedGemma Technical Report
Rule Learning for Knowledge Graph Reasoning under Agnostic Distribution Shift
Activation Steering for Chain-of-Thought Compression
Load more
Advancing Stroke Risk Prediction Using a Multi-modal Foundation Model
Created by
Haebom
作者
Camille Delgrange, Olga Demler, Samia Mora, Bjoern Menze, Ezequiel de la Rosa, Neda Davoudi
概要
本論文は、様々な臨床データモダリティを統合し、脳卒中リスク予測を改善する自己地図学習ベースのマルチモーダルフレームワークを提示します。 3D脳画像、臨床データ、画像由来の特徴を組み合わせて、脳卒中発症前のリスク予測を改善します。非表紙データセット(UK Biobank)を活用して、画像と表形式のデータモダリティ間の相補的で相乗効果のある情報をキャプチャします。対照学習フレームワークに基づいて、対照言語 - 映像事前学習と画像 - 表データマッチングモジュールを組み合わせて、複数のモーダルデータ表現を共有潜在空間に整列します。さまざまなモデル設定(固定および学習可能)のもとで従来最高性能のシングルモーダルおよびマルチモーダル方法と比較評価した結果、自己地図学習表データ(映像)方法よりROC-AUCで2.6%(2.6%)、バランス精度で3.3%(5.6%)向上し、最高性能のマルチモーダルマップ学習モデルよりバランス精度があります。解析可能なツールにより、表データと画像データの統合が改善され、より豊富で整列された埋め込みを提供することが示され、Gradient-weighted Class Activation Mappingヒートマップにより、脳の老化、脳卒中リスク、臨床結果に関連する脳領域が活性化されることが確認されました。
Takeaways、Limitations
•
Takeaways:
◦
セルフマップ学習ベースのマルチモーダルフレームワークにより、脳卒中リスク予測のパフォーマンスが向上しました。
◦
従来の最高性能のシングルモーダルとマルチモーダル方法を凌駕する性能を示しました。
◦
解析可能なツールを使用して、モデルの予測結果に関する洞察を提供します。
◦
さまざまなデータモダリティ統合のための強力な基盤を提供します。
•
Limitations:
◦
この研究は英国バイオバンクのデータセットに依存しており、他のデータセットへの一般化性能にはさらなる研究が必要です。
◦
自己地図学習の性質上、ラベル付けされていないデータの品質によってはパフォーマンスが影響を受ける可能性があります。
◦
モデルの解釈の可能性に関するさらなる研究が必要である。
◦
特定の人口集団の一般化性能評価が不足している可能性があります。
PDFを見る
Made with Slashpage