Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Emotions as Ambiguity-aware Ordinal Representations

From Tabula Rasa to Emergent Abilities: Discovering Robot Skills via Real-World Unsupervised Quality-Diversity

Enhancing Model Privacy in Federated Learning with Random Masking and Quantization

Scaling Laws for Task-Stratified Knowledge in Post-Training Quantized Large Language Models

Principled Detection of Hallucinations in Large Language Models via Multiple Testing

Vocoder-Projected Feature Discriminator

ControlEchoSynth: Boosting Ejection Fraction Estimation Models via Controlled Video Diffusion

Explain Before You Answer: A Survey on Compositional Visual Reasoning

Time-Aware One Step Diffusion Network for Real-World Image Super-Resolution

PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives

GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation

Input-Time Scaling

LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models

A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models

StreetViewAI: Making Street View Accessible Using Context-Aware Multimodal AI

Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning in LLMs

From Imitation to Optimization: A Comparative Study of Offline Learning for Autonomous Driving

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Human-Centered Human-AI Interaction (HC-HAII): A Human-Centered AI Perspective

GTPO: Trajectory-Based Policy Optimization in Large Language Models

Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery

A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial Transcriptomics

Invisible Architectures of Thought: Toward a New Science of AI as Cognitive Infrastructure

Revisiting Pre-trained Language Models for Vulnerability Detection

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Scaling Decentralized Learning with FLock

SegQuant: A Semantics-Aware and Generalizable Quantization Framework for Diffusion Models

Apple Intelligence Foundation Language Models: Tech Report 2025

Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning

PyVision: Agentic Vision with Dynamic Tooling

DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust

MEraser: An Effective Fingerprint Erasure Approach for Large Language Models

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

Pseudo-Simulation for Autonomous Driving

BinConv: A Neural Architecture for Ordinal Encoding in Time-Series Forecasting

FaceEditTalker: Controllable Talking Head Generation with Facial Attribute Editing

EnvInjection: Environmental Prompt Injection Attack to Multi-modal Web Agents

X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real

Heat Diffusion Models -- Interpixel Attention Mechanism

Bidirectional Task-Motion Planning Based on Hierarchical Reinforcement Learning for Strategic Confrontation

Multi-Type Context-Aware Conversational Recommender Systems via Mixture-of-Experts

Pricing AI Model Accuracy

Evaluating the Fitness of Ontologies for the Task of Question Generation

Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation

PGAD: Prototype-Guided Adaptive Distillation for Multi-Modal Learning in AD Diagnosis

Constructing a Norm for Children's Scientific Drawing: Distribution Features Based on Semantic Similarity of Large Language Models

An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model

Efficient PINNs via Multi-Head Unimodular Regularization of the Solutions Space

Statistical learning does not always entail knowledge

Score-based Generative Diffusion Models for Social Recommendations

PromptKeeper: Safeguarding System Prompts for LLMs

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Understanding Fairness-Accuracy Trade-offs in Machine Learning Models: Does Promoting Fairness Undermine Performance?

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Leveraging Multi-facet Paths for Heterogeneous Graph Representation Learning

Training with Explanations Alone: A New Paradigm to Prevent Shortcut Learning

Generation of Geodesics with Actor-Critic Reinforcement Learning to Predict Midpoints

TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes

HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models

StepWiser: Stepwise Generative Judges for Wiser Reasoning

AniME: Adaptive Multi-Agent Planning for Long Animation Generation

AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance

AI Chaperones Are (Really) All You Need to Prevent Parasocial Relationships with Chatbots

Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science

General agents contain world models

Approximate Lifted Model Construction

Fitness Landscape of Large Language Model-Assisted Automated Algorithm Search

Synthesizing High-Quality Programming Tasks with LLM-based Expert and Student Agents

Preference Elicitation for Multi-objective Combinatorial Optimization with Active Learning and Maximum Likelihood Estimation

Reference-Aligned Retrieval-Augmented Question Answering over Heterogeneous Proprietary Documents

Demonstrating specifications in gaming reasoning models

AirRAG: Autonomous Strategic Planning and Reasoning Steer Retrieval Augmented Generation

Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

From Evidence to Decision: Exploring Evaluative AI

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

Discrete-Guided Diffusion for Scalable and Safe Multi-Robot Motion Planning

Patch Progression Masked Autoencoder with Fusion CNN Network for Classifying Evolution Between Two Pairs of 2D OCT Slices

DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis

Large Language Models (LLMs) for Electronic Design Automation (EDA)

Symphony: A Decentralized Multi-Agent Framework for Scalable Collective Intelligence

HPC Digital Twins for Evaluating Scheduling Policies, Incentive Structures and their Impact on Power and Cooling

Decomposing Behavioral Phase Transitions in LLMs: Order Parameters for Emergent Misalignment

Cross-Platform E-Commerce Product Categorization and Recategorization: A Multimodal Hierarchical Classification Approach

Linear-Time Demonstration Selection for In-Context Learning via Gradient Estimation

MathBuddy: A Multimodal System for Affective Math Tutoring

Diffusion Language Models Know the Answer Before Decoding

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

Dhati+: Fine-tuned Large Language Models for Arabic Subjectivity Evaluation

WaveHiT-SR: Hierarchical Wavelet Network for Efficient Image Super-Resolution

The Next Layer: Augmenting Foundation Models with Structure-Preserving and Attention-Guided Learning for Local Patches to Global Context Awareness in Computational Pathology

Logical Reasoning with Outcome Reward Models for Test-Time Scaling

The Information Dynamics of Generative Diffusion

AI-Powered Detection of Inappropriate Language in Medical School Curricula

Generative AI for Testing of Autonomous Driving Systems: A Survey

Multispectral LiDAR data for extracting tree points in urban and suburban areas

Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation

Created by

Haebom

Author

Hengran Zhang, Minghao Tang, Keping Bi, Jiafeng Guo, Shihao Liu, Daiting Shi, Dawei Yin, Xueqi Cheng

Outline

This paper explores leveraging large-scale language models (LLMs) to annotate document usefulness and reduce reliance on expensive manual annotations in training retrieval and augmented retrieval generation (RAG) systems. To bridge the gap between retrieval relevance and generative usefulness, we use LLMs to annotate document usefulness. To effectively utilize multiple positive samples per query, we propose a novel loss function that maximizes their aggregated marginal likelihood. We use the Qwen-2.5-32B model to annotate the MS MARCO dataset for usefulness and conduct retrieval experiments on MS MARCO and BEIR, as well as RAG experiments on MS MARCO QA, NQ, and HotpotQA. Our experimental results show that LLM-generated annotations improve out-of-domain retrieval performance and RAG results compared to models trained solely on manual annotations or subsets of QA metrics. Furthermore, we achieve performance comparable to that achieved with fully manual annotations by combining LLM annotations with 20% of the manual annotations. This study presents a comprehensive approach for leveraging LLM annotations to initialize QA systems on new corpora.

Takeaways, Limitations

•

Takeaways:

◦

Document usability annotation using LLM reduces reliance on manual annotation and enables the construction of cost-effective QA systems.

◦

LLM annotations contribute to improving out-of-domain search performance and RAG performance.

◦

High performance can be achieved by combining small amounts of manual annotations with LLM annotations.

◦

We present an effective method for initializing a QA system for a new corpus.

•

Limitations:

◦

Further research is needed to determine the accuracy and reliability of LLM annotations.

◦

Validation of the generalizability of the results to specific LLMs and datasets is needed.

◦

Further experiments with different types of questions and datasets are needed.

View PDF

Made with Slashpage