Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Individual utilities of life satisfaction reveal inequality aversion unrelated to political alignment

DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge

Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Models

Computational Concept of the Psyche (in Russian)

MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining

The Efficiency Frontier: Classical Shadows versus Quantum Footage

BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models

Uncertainty Quantification in Probabilistic Machine Learning Models: Theory, Methods, and Insights

CURE: Controlled Unlearning for Robust Embeddings - Mitigating Conceptual Shortcuts in Pre-Trained Language Models

Revealing Hidden Precursors to Earthquakes via a Stress-Sensitive Transformation of Seismic Noise

ASE: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Subjective Behaviors and Preferences in LLM: Language of Browsing

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion

Self-Questioning Language Models

MetaExplainer: A Framework to Generate Multi-Type User-Centered Explanations for AI Systems

How Should We Meta-Learn Reinforcement Learning Algorithms?

Comprehensive Evaluation of Prototype Neural Networks

HIRAG: Hierarchical-Thought Instruction-Tuning Retrieval-Augmented Generation

CyberRAG: An Agentic RAG cyber attack classification and reporting tool

Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

A Nonlinear Low-rank Representation Model with Convolutional Neural Network for Imputing Water Quality Data

VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents

Discrete Diffusion in Large Language and Multimodal Models: A Survey

From Static to Adaptive Defense: Federated Multi-Agent Deep Reinforcement Learning-Driven Moving Target Defense Against DoS Attacks in UAV Swarm Networks

How Far Are We from Optimal Reasoning Efficiency?

Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations

Stopping Criteria for Value Iteration on Concurrent Stochastic Reachability and Safety Games

Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors

Prior Prompt Engineering for Reinforcement Fine-Tuning

Reasoning Large Language Model Errors Arise from Hallucinating Critical Problem Features

CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models

TransitReID: Transit OD Data Collection with Occlusion-Resistant Dynamic Passenger Re-Identification

TerraMind: Large-Scale Generative Multimodality for Earth Observation

Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?

Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation

A decision-theoretic approach to dealing with uncertainty in quantum mechanics

VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making

LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation

Reangle-A-Video: 4D Video Generation as Video-to-Video Translation

To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging

UAR-NVC: A Unified AutoRegressive Framework for Memory-Efficient Neural Video Compression

MPO: Boosting LLM Agents with Meta Plan Optimization

Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension

A general language model for peptide identification

Beyond Seen Data: Improving KBQA Generalization Through Schema-Guided Logical Form Generation

CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning

Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?

Traffic-Rule-Compliant Trajectory Repair via Satisfiability Modulo Theories and Reachability Analysis

QR-VC: Leveraging Quantization Residuals for Linear Disentanglement in Zero-Shot Voice Conversion

Generative AI for Data Augmentation in Wireless Networks: Analysis, Applications, and Case Study

Neural-Enhanced Dynamic Range Compression Inversion: A Hybrid Approach for Restoring Audio Dynamics

The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation Analysis

PriorCLIP: Visual Prior Guided Vision-Language Model for Remote Sensing Image-Text Retrieval

A Transformer approach for Electricity Price Forecasting

FedComLoc: Communication-Efficient Distributed Training of Sparse and Quantized Models

PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation

HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark?

Towards explainable decision support using hybrid neural models for logistic terminal automation

BlendedNet: A Blended Wing Body Aircraft Dataset and Surrogate Model for Aerodynamic Predictions

That's So FETCH: Fashioning Ensemble Techniques for LLM Classification in Civil Legal Intake and Referral

Murphys Laws of AI Alignment: Why the Gap Always Wins

Adaptive Monitoring and Real-World Evaluation of Agentic AI Systems

Bridging the Gap in Ophthalmic AI: MM-Retinal-Reason Dataset and OphthaReason Model toward Dynamic Multimodal Reasoning

Understanding visual attention beehind bee-inspired UAV navigation

Working with AI: Measuring the Applicability of Generative AI to Occupations

Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation

Context-Driven Knowledge Graph Completion with Semantic-Aware Relational Message Passing

Meta-Semantics Augmented Few-Shot Relational Learning

Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research

Associative Knowledge Graphs for Efficient Sequence Storage and Retrieval

Depth-Bounded Epistemic Planning

A Survey of Reinforcement Learning for Large Reasoning Models

Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation

QCardEst/QCardCorr: Quantum Cardinality Estimation and Correction

Merge-of-Thought Distillation

MoVoC: Morphology-Aware Subword Construction for Geez Script Languages

Scaling Truth: The Confidence Paradox in AI Fact-Checking

PianoVAM: A Multimodal Piano Performance Dataset

An End-to-End Deep Learning Framework for Arsenicosis Diagnosis Using Mobile-Captured Skin Images

Using AI to Optimize Patient Transfer and Resource Utilization During Mass-Casualty Incidents: A Simulation Platform

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Learning Turbulent Flows with Generative Models: Super-resolution, Forecasting, and Sparse Flow Reconstruction

FinZero: Launching Multi-modal Financial Time Series Forecast with Large Reasoning Model

DEQuify your force field: More efficient simulations using deep equilibrium models

X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates

Explainability of CNN Based Classification Models for Acoustic Signal

TANGO: Traversability-Aware Navigation with Local Metric Control for Topological Goals

A layered architecture for log analysis in complex IT systems

Reshaping the Forward-Forward Algorithm with a Similarity-Based Objective

Skeleton-based sign language recognition using a dual-stream spatio-temporal dynamic graph convolutional network

Robust Belief-State Policy Learning for Quantum Network Routing Under Decoherence and Time-Varying Conditions

Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations

RoentMod: A Synthetic Chest X-Ray Modification Model to Identify and Correct Image Interpretation Model Shortcuts

UOPSL: Unpaired OCT Predilection Sites Learning for Fundus Image Diagnosis Augmentation

OTESGN:Optimal Transport Enhanced Syntactic-Semantic Graph Networks for Aspect-Based Sentiment Analysis

Classification of 24-hour movement behaviors from wrist-worn accelerometer data: from handcrafted features to deep learning techniques

Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications

Interpretability as Alignment: Making Internal Understanding a Design Principle

MESH -- Understanding Videos Like Humans: Measuring Hallucinations in Large Video Models

TerraMind: Large-Scale Generative Multimodality for Earth Observation

Created by

Haebom

Author

Johannes Jakubik, Felix Yang, Benedikt Blumenstiel, Erik Scheurer, Rocco Sedona, Stefano Maurogiovanni, Jente Bosmans, Nikolaos Dionelis, Valerio Marsocci, Niklas Kopp, Rahul Ramachandran, Paolo Fraccaro, Thomas Brunschwiler, Gabriele Cavallaro, Juan Bernabe-Moreno, Nicolas Long ep e

Outline

TerraMind is the first random-to-random generative multimodal model for Earth observation. Unlike other multimodal models, TerraMind is pretrained on a dual-scale representation that combines token-level and pixel-level data across modes. At the token level, TerraMind encodes high-dimensional contextual information to learn cross-modal relationships, while at the pixel level, it leverages fine-grained representations to capture important spatial nuances. TerraMind is pretrained on nine geospatial modes from large-scale global datasets. This paper demonstrates that (i) TerraMind's dual-scale early fusion approach enables a variety of zero-shot and few-shot applications for Earth observation; (ii) TerraMind introduces a "thinking in modes" (TiM) feature that improves model output by generating additional artificial data during fine-tuning and inference; and (iii) TerraMind achieves state-of-the-art performance on community-standard benchmarks for EO, such as PANGAEA. The pretrained dataset, model weights, and code are open-sourced under a permissive license.

Takeaways, Limitations

•

Takeaways:

◦

Presenting the first random-to-random generative multimodal model for Earth observation.

◦

Zero-shot and few-shot applications possible with dual-scale initial fusion.

◦

Improving model performance with the "Thinking in Mode" (TiM) feature.

◦

Achieve cutting-edge performance in benchmarks such as PANGAEA

◦

Open source release of models, data, and code

•

Limitations:

◦

Limitations is not explicitly mentioned in the paper. Further experiments and evaluations may reveal Limitations regarding generalization performance, performance on specific types of geospatial data, computational cost, etc.

View PDF

Made with Slashpage