Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

HOSt3R: Keypoint-free Hand-Object 3D Reconstruction from RGB images

PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark

MedQARo: A Large-Scale Benchmark for Medical Question Answering in Romanian

OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models

Beyond Imaging: Vision Transformer Digital Twin Surrogates for 3D+T Biological Tissue Dynamics

TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference

Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets

Test-time Corpus Feedback: From Retrieval to RAG

$\Mathrm{TIME}[t]\subseteq \mathrm{SPACE}[O(\sqrt{t})]$ via Tree Height Compression

ZPD-SCA: Unveiling the Blind Spots of LLMs in Assessing Students’ Cognitive Abilities

Documenting Deployment with Fabric: A Repository of Real-World AI Governance

High-Throughput Low-Cost Segmentation of Brightfield Microscopy Live Cell Images

Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches

Robust Federated Learning under Adversarial Attacks via Loss-Based Client Clustering

SupraTok: Cross-Boundary Tokenization for Enhanced Language Model Performance

Trustworthy AI Psychotherapy: Multi-Agent LLM Workflow for Counseling and Explainable Mental Disorder Diagnosis

Minimizing Surrogate Losses for Decision-Focused Learning using Differentiable Optimization

LETToT: Label-Free Evaluation of Large Language Models On Tourism Using Expert Tree-of-Thought

CURE: Critical-Token-Guided Re-Concatenation for Entropy-Collapse Prevention

Fourier-Guided Attention Upsampling for Image Super-Resolution

Fine-Grained Safety Neurons with Training-Free Continual Projection to Reduce LLM Fine Tuning Risks

Evaluating Contrast Localizer for Identifying Causal Units in Social & Mathematical Tasks in Language Models

Neural Logic Networks for Interpretable Classification

DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

Diffusing the Blind Spot: Uterine MRI Synthesis with Diffusion Models

Architectural Co-Design for Zero-Shot Anomaly Detection: Decoupling Representation and Dynamically Fusing Features in CLIP

Leveraging GNN to Enhance MEF Method in Predicting ENSO

Large-scale Multi-sequence Pretraining for Generalizable MRI Analysis in Versatile Clinical Applications

SGD Convergence under Stepsize Shrinkage in Low-Precision Training

CLAP: Coreference-Linked Augmentation for Passage Retrieval

Geometry-Aware Spiking Graph Neural Network

SIFThinker: Spatially-Aware Image Focus for Visual Reasoning

4D-PreNet: A Unified Preprocessing Framework for 4D-STEM Data Analysis

MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning

Large-Scale Model Enabled Semantic Communication Based on Robust Knowledge Distillation

Enhancing material behavior discovery using embedding-oriented Physically-Guided Neural Networks with Internal Variables

Agentic large language models improve retrieval-based radiology question answering

PARROT: An Open Multilingual Radiology Reports Dataset

Trusted Knowledge Extraction for Operations and Maintenance Intelligence

MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization

Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging

Controllable Hybrid Captioner for Improved Long-form Video Understanding

Combining Cost-Constrained Runtime Monitors for AI Safety

QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks

FlexOlmo: Open Language Models for Flexible Data Use

Multi-Level Fusion Graph Neural Network for Molecule Property Prediction

BiMark: Unbiased Multilayer Watermarking for Large Language Models

A foundation model with multi-variate parallel attention to generate neuronal activity

Effective Red-Teaming of Policy-Adherent Agents

From Legal Texts to Defeasible Deontic Logic via LLMs: A Study in Automated Semantic Analysis

LLM-D12: A Dual-Dimensional Scale of Instrumental and Relational Dependencies on Large Language Models

AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models

Auto prompt sql: a resource-efficient architecture for text-to-sql translation in constrained environments

EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models

Accountability Attribution: Tracing Model Behavior to Training Processes

Equivariant Spherical Transformer for Efficient Molecular Modeling

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate

Security Concerns for Large Language Models: A Survey

Large Language Models in the Task of Automatic Validation of Text Classifier Predictions

Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language Models

Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation

An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems

IRONIC: Coherence-Aware Reasoning Chains for Multi-Modal Sarcasm Detection

Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding

Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs

Learning with Spike Synchrony in Spiking Neural Networks

Explainable Prediction of the Mechanical Properties of Composites with CNNs

From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora

A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?

RT-Cache: Training-Free Retrieval for Real-Time Manipulation

DSADF: Thinking Fast and Slow for Decision Making

WATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales

ICQuant: Index Coding enables Low-bit LLM Quantization

Multimodal Masked Autoencoder Pre-training for 3D MRI-Based Brain Tumor Analysis with Missing Modalities

Theory of Mind in Large Language Models: Assessment and Enhancement

SVD Based Least Squares for X-Ray Pneumonia Classification Using Deep Features

VeriCoder: Enhancing LLM-Based RTL Code Generation through Functional Correctness Validation

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

CLaP -- State Detection from Time Series

Celler:A Genomic Language Model for Long-Tailed Single-Cell Annotation

Exponentially Weighted Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection Model Training in Unmanned Aerial Vehicles Surveillance Scenarios

ImF: Implicit Fingerprint for Large Language Models

HoarePrompt: Structural Reasoning About Program Correctness in Natural Language

More Women, Same Stereotypes: Unpacking the Gender Bias Paradox in Large Language Models

MedLoRD: A Medical Low-Resource Diffusion Model for High-Resolution 3D CT Image Synthesis

CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation

AutoMisty: A Multi-Agent LLM Framework for Automated Code Generation in the Misty Social Robot

BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation

Forgotten Polygons: Multimodal Large Language Models are Shape-Blind

Control Illusion: The Failure of Instruction Hierarchies in Large Language Models

Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models

EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models

Investigating the Robustness of Deductive Reasoning with Large Language Models

Field Matching: an Electrostatic Paradigm to Generate and Transfer Data

Evaluation of Large Language Models via Coupled Token Generation

Towards Privacy-aware Mental Health AI Models: Advances, Challenges, and Opportunities

Multimodal Masked Autoencoder Pre-training for 3D MRI-Based Brain Tumor Analysis with Missing Modalities

Created by

Haebom

Author

Lucas Robinet, Ahmad Berjaoui, Elizabeth Cohen-Jonathan Moyal

Outline

This paper presents BM-MAE, a novel pre-training strategy specialized for multimodal magnetic resonance imaging (MRI) data. Existing multimodal MRI analysis methods are designed under the assumption that all modalities are always available, making them vulnerable to modality loss issues encountered in real-world clinical settings. BM-MAE, based on Masked Image Modeling (MIM), is designed to allow a single pre-trained model to adaptively operate regardless of the available modality combinations. This allows for the benefits of a pre-trained model across all modalities, even when fine-tuning with a subset of modalities. Experimental results demonstrate that BM-MAE outperforms or even surpasses existing methods that perform separate pre-training for each modality combination, and significantly outperforms learning from scratch across multiple downstream tasks. Furthermore, it demonstrates the ability to efficiently reconstruct missing modalities.

Takeaways, Limitations

•

Takeaways:

◦

We present a novel pre-training strategy to effectively address modality loss issues in multimodal MRI data.

◦

Adaptable to various modality combinations with a single model, improving resource efficiency and clinical applicability.

◦

Presenting the possibility of efficient reconstruction of missing modalities.

◦

Achieves superior or equivalent performance compared to existing methods in various downstream tasks.

•

Limitations:

◦

The performance of BM-MAE presented in this paper may be limited to specific downstream tasks and datasets. Further evaluation of generalization performance on other types of medical image data and tasks is needed.

◦

Performance may be affected by the size and quality of the dataset used for pretraining. Further research is needed to explore scalability to larger and more diverse datasets.

◦

A more detailed analysis of the degree of performance degradation under modality loss scenarios is needed. Further research is needed to determine whether the system is vulnerable to specific modality loss patterns.

View PDF

Made with Slashpage