Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CEHR-XGPT: A Scalable Multi-Task Foundation Model for Electronic Health Records

Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens

Adaptive Learning Strategies for Mitotic Figure Classification in MIDOG2025 Challenge

MitoDetect++: A Domain-Robust Pipeline for Mitosis Detection and Atypical Subtyping

Align-Then-StEer: Adapting the Vision-Language Action Models through Unified Latent Guidance

Fantastic Pretraining Optimizers and Where to Find Them

Towards Interpretable Geo-localization: a Concept-Aware Global Image-GPS Alignment Framework

TECP: Token-Entropy Conformal Prediction for LLMs

The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management

Train-Once Plan-Anywhere Kinodynamic Motion Planning via Diffusion Trees

Skill-Aligned Fairness in Multi-Agent Learning for Collaboration in Healthcare

Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets

AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection

HuggingGraph: Understanding the Supply Chain of LLM Ecosystem

Food safety trends across Europe: insights from the 392-million-entry CompreHensive European Food Safety (CHEFS) database

Simple Yet Effective: An Information-Theoretic Approach to Multi-LLM Uncertainty Quantification

BayesSDF: Surface-Based Laplacian Uncertainty Estimation for 3D Geometry with Neural Signed Distance Fields

Empowering Bridge Digital Twins by Bridging the Data Gap with a Unified Synthesis Framework

The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations

AI-Assisted Rapid Crystal Structure Generation Towards a Target Local Environment

First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay

TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

Cutting Through Privacy: A Hyperplane-Based Data Reconstruction Attack in Federated Learning

AutoPDL: Automatic Prompt Optimization for LLM Agents

RailGoerl24: G\"orlitz Rail Test Center CV Dataset 2024

Revealing higher-order neural representations of uncertainty with the Noise Estimation through Reinforcement-based Diffusion (NERD) model

PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

Spoof Trace Discovery for Deep Learning Based Explainable Face Anti-Spoofing

The Information Security Awareness of Large Language Models

Automatically Detecting Online Deceptive Patterns

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

Automated detection of underdiagnosed medical conditions via opportunistic imaging

Selective Preference Optimization via Token-Level Reward Function Estimation

ATHAR: A High-Quality and Diverse Dataset for Classical Arabic to English Translation

PersonaGym: Evaluating Persona Agents and LLMs

CFaults: Model-Based Diagnosis for Fault Localization in C Programs with Multiple Test Cases

From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Demystifying Chains, Trees, and Graphs of Thoughts

Survival Analysis with Adversarial Regularization

Net2Brain: A Toolbox to compare artificial vision models with human brain responses

The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs

PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Dynamic Speculative Agent Planning

AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning

Graph RAG as Human Choice Model: Building a Data-Driven Mobility Agent with Preference Chain

MHSNet:An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate Resume Detection with Large Language Model

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

MeLA: A Metacognitive LLM-Driven Architecture for Automatic Heuristic Design

Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment

DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning

Don't Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning

Translating Federated Learning Algorithms in Python into CSP Processes Using ChatGPT

ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding

Epistemic Skills: Reasoning about Knowledge and Oblivion

Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment

GUI Agents: A Survey

Neural Network Verification with PyRAT

Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning

Low-Dimensional Federated Knowledge Graph Embedding via Knowledge Distillation

MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining

SpikingBrain Technical Report: Spiking Brain-inspired Large Models

Scaling Performance of Large Language Model Pretraining

Recomposer: Event-roll-guided generative audio editing

COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization

Uncertain but Useful: Leveraging CNN Variability into Data Augmentation

CURE: Controlled Unlearning for Robust Embeddings -- Mitigating Conceptual Shortcuts in Pre-Trained Language Models

HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models

RapidGNN: Energy and Communication-Efficient Distributed Training on Large-Scale Graph Neural Networks

Enhancing 3D Point Cloud Classification with ModelNet-R and Point-SkipNet

AI Agents for Web Testing: A Case Study in the Wild

Accuracy-Constrained CNN Pruning for Efficient and Reliable EEG-Based Seizure Detection

Exploring Situated Stabilities of a Rhythm Generation System through Variational Cross-Examination

GenAI-based test case generation and execution in SDV platform

ICR: Iterative Clarification and Rewriting for Conversational Search

ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions

Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

Pointing-Guided Target Estimation via Transformer-Based Attention

Adversarial Augmentation and Active Sampling for Robust Cyber Anomaly Detection

LLM Enabled Multi-Agent System for 6G Networks: Framework and Method of Dual-Loop Edge-Terminal Collaboration

High-Resolution Global Land Surface Temperature Retrieval via a Coupled Mechanism-Machine Learning Framework

Exploring an implementation of quantum learning pipeline for support vector machines

DeGuV: Depth-Guided Visual Reinforcement Learning for Generalization and Interpretability in Manipulation

Artificial intelligence for representing and characterizing quantum systems

PLaMo 2 Technical Report

SpiderNets: Estimating Fear Ratings of Spider-Related Images with Vision Models

The Paradox of Doom: Acknowledging Extinction Risk Reduces the Incentive to Prevent It

A Knowledge-Driven Diffusion Policy for End-to-End Autonomous Driving Based on Expert Routing

REMOTE: A Unified Multimodal Relation Extraction Framework with Multilevel Optimal Transport and Mixture-of-Experts

PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination

Exploring Non-Local Spatial-Angular Correlations with a Hybrid Mamba-Transformer Framework for Light Field Super-Resolution

AI-Driven Fronthaul Link Compression in Wireless Communication Systems: Review and Method Design

Toward Accessible Dermatology: Skin Lesion Classification Using Deep Learning Models on Mobile-Acquired Images

Graph Unlearning: Efficient Node Removal in Graph Neural Networks

Enhancing Diversity in Large Language Models via Determinantal Point Processes

VARMA-Enhanced Transformer for Time Series Forecasting

The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models

Simple Yet Effective: An Information-Theoretic Approach to Multi-LLM Uncertainty Quantification

Created by

Haebom

Author

Maya Kruse, Majid Afshar, Saksham Khatwani, Anoop Mayampurath, Guanhua Chen, Yanjun Gao

Outline

This paper proposes MUSE (Multi-LLM Uncertainty via Subset Ensembles), an uncertainty quantification method that leverages model diversity to address the inconsistency problem of large-scale language models (LLMs). MUSE uses Jensen-Shannon Divergence to identify and aggregate well-calibrated subsets of LLMs, providing more reliable uncertainty estimates. It is based on the assumption that LLMs provide complementary predictions due to their different learning processes and the Zipfian distribution of languages. This method demonstrates improved calibration and prediction performance compared to single-model and simple set-based models in binary prediction tasks. We also explore how MUSE can be used in conjunction with chain-of-thought distillation to fine-tune the calibration of LLMs. MUSE is available on GitHub.

Takeaways, Limitations

•

Takeaways:

◦

We demonstrate that leveraging the model diversity of LLM can improve the accuracy of uncertainty estimation.

◦

Jensen-Shannon Divergence-based MUSE method outperforms single-model and simple set-based models.

◦

Possibility of improving LLM correction through combination with chain-of-thought distillation.

◦

Providing the possibility of expanding research and utilization through open-source release of the developed MUSE method.

•

Limitations:

◦

Currently, only experimental results for binary classification problems are presented, and further research is needed to determine generalizability to multi-class classification or other types of tasks.

◦

MUSE's performance improvements may be limited to specific datasets and models, and its generalizability across a variety of situations needs to be verified.

◦

There is a lack of comparative performance analysis using information theoretic metrics other than Jensen-Shannon Divergence.

◦

Further research is needed to optimize the subset selection strategy of LLM.

View PDF

Made with Slashpage