Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation

Thought Anchors: Which LLM Reasoning Steps Matter?

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective

OmniGen2: Exploration to Advanced Multimodal Generation

Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning

Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models

Quantum-Classical Hybrid Quantized Neural Network

Non-equilibrium Annealed Adjoint Sampler

PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding

Mapping the Evolution of Research Contributions using KnoVo

MS-TVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Dynamic Convolution

No Free Lunch: Rethinking Internal Feedback for LLM Reasoning

TabArena: A Living Benchmark for Machine Learning on Tabular Data

VRAIL: Vectorized Reward-based Attribution for Interpretable Learning

CLAIM: Clinically-Guided LGE Augmentation for Realistic and Diverse Myocardial Scar Synthesis and Segmentation

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

IKDiffuser: A Generative Inverse Kinematics Solver for Multi-arm Robots via Diffusion Model

Fine-Grained Perturbation Guidance via Attention Head Selection

Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

C3S3: Complementary Competition and Contrastive Selection for Semi-Supervised Medical Image Segmentation

SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications

Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift?

CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models

AIDRIN 2.0: A Framework to Assess Data Readiness for AI

TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis

Teacher Motion Priors: Enhancing Robot Locomotion over Challenging Terrain

WoundAmbit: Bridging State-of-the-Art Semantic Segmentation and Real-World Wound Care

Computation Mechanism Behind LLM Position Generalization

Training Plug-n-Play Knowledge Modules with Deep Context Distillation

MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel

From $\mathcal{O}(n^{2})$ to $\mathcal{O}(n)$ Parameters: Quantum Self-Attention in Vision Transformers for Biomedical Image Classification

Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation

FGS-SLAM: Fourier-based Gaussian Splatting for Real-time SLAM with Sparse and Dense Map Fusion

Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners

Protein Structure Tokenization: Benchmarking and New Recipe

Chemical knowledge-informed framework for privacy-aware retrosynthesis learning

Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning

Diffusion Models Through a Global Lens: Are They Culturally Inclusive?

WyckoffDiff -- A Generative Diffusion Model for Crystal Symmetry

Solving Linear-Gaussian Bayesian Inverse Problems with Decoupled Diffusion Sequential Monte Carlo

Adversarial Reasoning at Jailbreaking Time

AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

Rethinking Early Stopping: Refine, Then Calibrate

Unlocking In-Context Learning for Natural Datasets Beyond Language Modeling

Towards Backdoor Stealthiness in Model Parameter Space

Distributed satellite information networks: Architecture, enabling technologies, and trends

Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair

Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains

Understanding World or Predicting Future? A Comprehensive Survey of World Models

USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting

Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

Toddlers' Active Gaze Behavior Supports Self-Supervised Object Learning

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models

Evaluating Long Range Dependency Handling in Code Generation LLMs

Physics-informed Imitative Reinforcement Learning for Real-world Driving

COBRA-PPM: A Causal Bayesian Reasoning Architecture Using Probabilistic Programming for Robot Manipulation Under Uncertainty

FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation

Do Concept Bottleneck Models Respect Localities?

When Large Language Models contradict humans? Large Language Models' Sycophantic Behavior

Low-light Pedestrian Detection in Visible and Infrared Image Feeds: Issues and Challenges

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models

Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

The Alignment Trap: Complexity Barriers

The State of Large Language Models for African Languages: Progress and Challenges

Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation

Turing Test 2.0: The General Intelligence Threshold

$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking

RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Towards Better Benchmark Datasets for Inductive Knowledge Graph Completion

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

Disentangled representations of microscopy images

Define-ML: An Approach to Ideate Machine Learning-Enabled Systems

Weighted Mean Frequencies: a handcraft Fourier feature for 4D Flow MRI segmentation

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings

AI in the Writing Process: How Purposeful AI Supports Fosters Student Writing

Dense Video Captioning using Graph-based Sentence Summarization

Causal Representation Learning with Observation Grouping for CXR Classification

Vulnerability Disclosure through Adaptive Black-Box Adversarial Attacks on NIDS

Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization

DeepQuark: deep-neural-network approach to multiquark bound states

Large Language Model-Driven Code Compliance Checking in Building Information Modeling

Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks

When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs

WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads

Industrial Energy Disaggregation with Digital Twin-generated Dataset and Efficient Data Augmentation

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

ReCode: Updating Code API Knowledge with Reinforcement Learning

Counterfactual Influence as a Distributional Quantity

Automatic Demonstration Selection for LLM-based Tabular Data Classification

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

Off-Policy Evaluation and Learning for the Future under Non-Stationarity

SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models

Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning

CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition

Rethinking Early Stopping: Refine, Then Calibrate

Created by

Haebom

Author

Eugene Berta, David Holzm uller, Michael I. Jordan, Francis Bach

Outline

This paper uses an appropriate loss function such as cross entropy to assess the quality of probabilistic predictions of machine learning classifiers, and points out that it can be decomposed into two components: calibration error and refinement error. The calibration error evaluates the overall under/overconfidence, while the refinement error measures the ability to distinguish different classes. This paper presents a new variational formulation of the calibration-refinement decomposition, which provides a new perspective on post-calibration and allows fast estimation of different terms. Through this, we provide theoretical and experimental evidence that the calibration error and refinement error are not minimized simultaneously during training. Therefore, choosing the optimal epoch based on the validation loss leads to a suboptimal trade-off for both terms. To address this, this paper proposes a method (Refine... then Calibrate) that minimizes only the refinement error during training before minimizing the post-calibration error using standard techniques. This method is seamlessly integrated into all classifiers and consistently improves performance on a variety of classification tasks.

Takeaways, Limitations

•

Takeaways:

◦

We point out the limitations of the existing validation loss-based optimal epoch selection method and propose a new approach that separates and optimizes the calibration error and refinement error.

◦

We experimentally demonstrate that the "Refine... then Calibrate" strategy can improve performance on various classification tasks.

◦

We provide a novel method for efficiently estimating correction and refinement errors via variational formulations.

•

Limitations:

◦

Further research is needed on the generalization performance of the proposed method. It may overfit to certain datasets or classifiers.

◦

The "Refine... then Calibrate" strategy may lack guidelines for setting optimization parameters at each step.

◦

Extensive experimental results for different types of classifiers and loss functions are required.

View PDF

Made with Slashpage