Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Emotions as Ambiguity-aware Ordinal Representations

From Tabula Rasa to Emergent Abilities: Discovering Robot Skills via Real-World Unsupervised Quality-Diversity

Enhancing Model Privacy in Federated Learning with Random Masking and Quantization

Scaling Laws for Task-Stratified Knowledge in Post-Training Quantized Large Language Models

Principled Detection of Hallucinations in Large Language Models via Multiple Testing

Vocoder-Projected Feature Discriminator

ControlEchoSynth: Boosting Ejection Fraction Estimation Models via Controlled Video Diffusion

Explain Before You Answer: A Survey on Compositional Visual Reasoning

Time-Aware One Step Diffusion Network for Real-World Image Super-Resolution

PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives

GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation

Input-Time Scaling

LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models

A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models

StreetViewAI: Making Street View Accessible Using Context-Aware Multimodal AI

Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning in LLMs

From Imitation to Optimization: A Comparative Study of Offline Learning for Autonomous Driving

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Human-Centered Human-AI Interaction (HC-HAII): A Human-Centered AI Perspective

GTPO: Trajectory-Based Policy Optimization in Large Language Models

Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery

A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial Transcriptomics

Invisible Architectures of Thought: Toward a New Science of AI as Cognitive Infrastructure

Revisiting Pre-trained Language Models for Vulnerability Detection

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Scaling Decentralized Learning with FLock

SegQuant: A Semantics-Aware and Generalizable Quantization Framework for Diffusion Models

Apple Intelligence Foundation Language Models: Tech Report 2025

Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning

PyVision: Agentic Vision with Dynamic Tooling

DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust

MEraser: An Effective Fingerprint Erasure Approach for Large Language Models

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

Pseudo-Simulation for Autonomous Driving

BinConv: A Neural Architecture for Ordinal Encoding in Time-Series Forecasting

FaceEditTalker: Controllable Talking Head Generation with Facial Attribute Editing

EnvInjection: Environmental Prompt Injection Attack to Multi-modal Web Agents

X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real

Heat Diffusion Models -- Interpixel Attention Mechanism

Bidirectional Task-Motion Planning Based on Hierarchical Reinforcement Learning for Strategic Confrontation

Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts

Pricing AI Model Accuracy

Evaluating the Fitness of Ontologies for the Task of Question Generation

Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation

PGAD: Prototype-Guided Adaptive Distillation for Multi-Modal Learning in AD Diagnosis

Constructing a Norm for Children's Scientific Drawing: Distribution Features Based on Semantic Similarity of Large Language Models

An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model

Efficient PINNs via Multi-Head Unimodular Regularization of the Solutions Space

Statistical learning does not always entail knowledge

Score-based Generative Diffusion Models for Social Recommendations

PromptKeeper: Safeguarding System Prompts for LLMs

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Understanding Fairness-Accuracy Trade-offs in Machine Learning Models: Does Promoting Fairness Undermine Performance?

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Leveraging Multi-facet Paths for Heterogeneous Graph Representation Learning

Training with Explanations Alone: A New Paradigm to Prevent Shortcut Learning

Generation of Geodesics with Actor-Critic Reinforcement Learning to Predict Midpoints

TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes

HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models

StepWiser: Stepwise Generative Judges for Wiser Reasoning

AniME: Adaptive Multi-Agent Planning for Long Animation Generation

AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance

AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots

Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science

General agents contain world models

Approximate Lifted Model Construction

Fitness Landscape of Large Language Model-Assisted Automated Algorithm Search

Synthesizing High-Quality Programming Tasks with LLM-based Expert and Student Agents

Preference Elicitation for Multi-objective Combinatorial Optimization with Active Learning and Maximum Likelihood Estimation

Reference-Aligned Retrieval-Augmented Question Answering over Heterogeneous Proprietary Documents

Demonstrating specifications in gaming reasoning models

AirRAG: Autonomous Strategic Planning and Reasoning Steer Retrieval Augmented Generation

Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

From Evidence to Decision: Exploring Evaluative AI

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Discrete-Guided Diffusion for Scalable and Safe Multi-Robot Motion Planning

Patch Progression Masked Autoencoder with Fusion CNN Network for Classifying Evolution Between Two Pairs of 2D OCT Slices

DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis

Large Language Models (LLMs) for Electronic Design Automation (EDA)

Symphony: A Decentralized Multi-Agent Framework for Scalable Collective Intelligence

HPC Digital Twins for Evaluating Scheduling Policies, Incentive Structures and their Impact on Power and Cooling

Decomposing Behavioral Phase Transitions in LLMs: Order Parameters for Emergent Misalignment

Cross-Platform E-Commerce Product Categorization and Recategorization: A Multimodal Hierarchical Classification Approach

Linear-Time Demonstration Selection for In-Context Learning via Gradient Estimation

MathBuddy: A Multimodal System for Affective Math Tutoring

Diffusion Language Models Know the Answer Before Decoding

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

Dhati+: Fine-tuned Large Language Models for Arabic Subjectivity Evaluation

WaveHiT-SR: Hierarchical Wavelet Network for Efficient Image Super-Resolution

The Next Layer: Augmenting Foundation Models with Structure-Preserving and Attention-Guided Learning for Local Patches to Global Context Awareness in Computational Pathology

Logical Reasoning with Outcome Reward Models for Test-Time Scaling

The Information Dynamics of Generative Diffusion

AI-Powered Detection of Inappropriate Language in Medical School Curricula

Generative AI for Testing of Autonomous Driving Systems: A Survey

Multispectral LiDAR data for extracting tree points in urban and suburban areas

The Next Layer: Augmenting Foundation Models with Structure-Preserving and Attention-Guided Learning for Local Patches to Global Context Awareness in Computational Pathology

Created by

Haebom

Author

Muhammad Waqas, Rukhmini Bandyopadhyay, Eman Showkatian, Amgad Muneer, Anas Zafar, Frank Rojas Alvarez, Maricel Corredor Marin, Wentao Li, David Jaffray, Cara Haymaker, John Heymach, Natalie I Vokes, Luisa Maren Solis Soto, Jianjun Zhang, Jia Wu

Outline

EAGLE-Net is a structure-preserving, attention-based architecture based on multi-instance learning (MIL) that overcomes the limitations of conventional foundation models by incorporating mechanisms that leverage both the global spatial structure of the tissue and local contextual relationships between diagnostically relevant regions. It captures global tissue structure through multi-scale absolute spatial encoding, focuses attention on the local microenvironment through top-K neighbor recognition loss, and minimizes false positives through background suppression loss. Evaluated on three cancer type classification tasks (10,260 slides) and seven cancer type survival prediction tasks (4,172 slides) using three different histological foundation backbones (REMEDIES, Uni-V1, and Uni2-h), it achieved up to 3% improved classification accuracy and the highest agreement index in six of the seven cancer types. It also produces smooth, biologically consistent attention maps that align with expert annotations and highlight invasive fronts, necrosis, and immune infiltrates.

Takeaways, Limitations

•

Takeaways:

◦

We present a generalizable and interpretable MIL framework that complements the foundational model to improve understanding of the tumor microenvironment.

◦

Improved prediction accuracy and interpretability are achieved through multi-scale spatial encoding and top-K neighbor-aware loss.

◦

It shows excellent performance in various cancer types and tasks (classification and survival prediction).

◦

Contributes to biomarker discovery and prognostic modeling by generating biologically meaningful attention maps.

•

Limitations:

◦

Since these are performance evaluation results for a specific foundation model and dataset, further research is needed to determine generalization performance for other models or datasets.

◦

There is a lack of comparative performance analysis for models other than the three currently used backbone models.

◦

Validation on a wider variety of cancer types and larger datasets is needed.

View PDF

Made with Slashpage