Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

What Drives Compositional Generalization in Visual Generative Models?

Relevance-Aware Thresholding in Online Conformal Prediction for Time Series

MINERVA: Mutual Information Neural Estimation for Supervised Feature Selection

Pretraining with hierarchical memories: separating long-tail and common knowledge

Beyond Manuals and Tasks: Instance-Level Context Learning for LLM Agents

InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents

Comparing Contrastive and Triplet Loss: Variance Analysis and Optimization Behavior

Generating Findings for Jaw Cysts in Dental Panoramic Radiographs Using GPT-4o: Building a Two-Stage Self-Correction Loop with Structured Output (SLSO) Framework

Automated Defect Detection for Mass-Produced Electronic Components Based on YOLO Object Detection Models

NGGAN: Noise Generation GAN Based on the Practical Measurement Dataset for Narrowband Powerline Communications

PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics Randomization

Format Inertia: A Failure Mechanism of LLMs in Medical Pre-Consultation

Rethinking KL Regularization in RLHF: From Value Estimation to Gradient Optimization

Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

MG2FlowNet: Accelerating High-Reward Sample Generation via Enhanced MCTS and Greediness Control

LLM-MCoX: Large Language Model-based Multi-robot Coordinated Exploration and Search

Auto-ARGUE: LLM-Based Report Generation Evaluation

Muon Outperforms Adam in Tail-End Associative Memory Learning

Learning to Reason as Action Abstractions with Scalable Mid-Training RL

Autonomy-Aware Clustering: When Local Decisions Supersede Global Prescriptions

HNote: Extending YNote with Hexadecimal Encoding for Fine-Tuning LLMs in Music Modeling

Uncertainty-Aware Generative Oversampling Using an Entropy-Guided Conditional Variational Autoencoder

Artificial Authority: From Machine Minds to Political Alignments. An Experimental Analysis of Democratic and Autocratic Biases in Large-Language Models

InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions

Generating High-Quality Datasets for Code Editing via Open-Source Language Models

Jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking

SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions

Uncovering Grounding IDs: How External Cues Shape Multi-Modal Binding

FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning

Interpreting deep learning-based stellar mass estimation via causal analysis and mutual information decomposition

HFuzzer: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing

Boundary on the Table: Efficient Black-Box Decision-Based Attacks for Structured Data

Prompt-aware classifier free guidance for diffusion models

Active Attacks: Red-teaming LLMs via Adaptive Environments

Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence

When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity

Fine-Grained AI Model Caching and Downloading With Coordinated Multipoint Broadcasting in Multi-Cell Edge Networks

Reinforced Generation of Combinatorial Structures: Applications to Complexity Theory

The Narcissus Hypothesis: Descending to the Rung of Illusion

Rethinking the Role of Text Complexity in Language Model Pretraining

Do Vision-Language Models See Urban Scenes as People Do? An Urban Perception Benchmark

FedMentor: Domain-Aware Differential Privacy for Heterogeneous Federated LLMs in Mental Health

MIA-EPT: Membership Inference Attack via Error Prediction for Tabular Data

Fun-ASR Technical Report

Population-Aligned Persona Generation for LLM-based Social Simulation

TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation

Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation

X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates

Time2time: Causal Intervention in Hidden States to Simulate Rare Events in Time Series Foundation Models

A Knowledge-Driven Diffusion Policy for End-to-End Autonomous Driving Based on Expert Routing

Post-training Large Language Models for Diverse High-Quality Responses

Attention as an Adaptive Filter

INGRID: Intelligent Generative Robotic Design Using Large Language Models

Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages

Mixture of Contexts for Long Video Generation

Flexible metadata harvesting for ecology using large language models

Emotional Manipulation by AI Companions

SSFO: Self-Supervised Faithfulness Optimization for Retrieval-Augmented Generation

Negative Shanshui: Real-time Interactive Ink Painting Synthesis

On Zero-Shot Reinforcement Learning

OpenWHO: A Document-Level Parallel Corpus for Health Translation in Low-Resource Languages

SurGE: A Benchmark and Evaluation Framework for Scientific Survey Generation

Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration

A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models

TSLA: A Task-Specific Learning Adaptation for Semantic Segmentation on Autonomous Vehicles Platform

Street Review: A Participatory AI-Based Framework for Assessing Streetscape Inclusivity

Synaptic Pruning: A Biological Inspiration for Deep Learning Regularization

Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models

LPI-RIT at LeWiDi-2025: Improving Distributional Predictions via Metadata and Loss Reweighting with DisCo

SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering

C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

First Hallucination Tokens Are Different from Conditional Ones

Solar Photovoltaic Assessment with Large Language Model

SIA: Enhancing Safety via Intent Awareness for Vision-Language Models

Thought Purity: A Defense Framework For Chain-of-Thought Attack

MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering

TolerantECG: A Foundation Model for Imperfect Electrocardiogram

Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators

Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications

Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems

Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

Self-Correction Bench: Uncovering and Addressing the Self-Correction Blind Spot in Large Language Models

Using cognitive models to reveal value trade-offs in language models

Refactoring Codebases through Library Design

PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation

Towards Understanding Bias in Synthetic Data for Evaluation

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

Micro-Act: Mitigating Knowledge Conflict in LLM-based RAG via Actionable Self-Reasoning

SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?

MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science

SALAD: Systematic Assessment of Machine Unlearning on LLM-Aided Hardware Design

In-Context Learning for Pure Exploration

FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens

RFCAudit: An LLM Agent for Functional Bug Detection in Network Protocols

The Security Threat of Compressed Projectors in Large Vision-Language Models

Rethinking Exact Unlearning under Exposure: Extracting Forgotten Data under Exact Unlearning in Large Language Model

Human Empathy as Encoder: AI-Assisted Depression Assessment in Special Education

CryoCCD: Conditional Cycle-consistent Diffusion with Biophysical Modeling for Cryo-EM Synthesis

Local Stability and Region of Attraction Analysis for Neural Network Feedback Systems under Positivity Constraints

What Has Been Lost with Synthetic Evaluation?

SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?

Created by

Haebom

Author

Senyu Li, Jiayi Wang, Felermino DMA Ali, Colin Cherry, Daniel Deutsch, Eleftheria Briakou, Rui Sousa-Silva, Henrique Lopes Cardoso, Pontus Stenetorp, David Ifeoluwa Adelani

Outline

To address the challenges of machine translation (MT) quality assessment for low-resource African languages, this study introduces a large-scale human-annotated MT evaluation dataset (SSA-MTE) covering 14 African language pairs. SSA-MTE contains over 73,000 sentence-level annotations from the news domain, and we develop improved reference-based and reference-free evaluation metrics, SSA-COMET and SSA-COMET-QE, based on this dataset. We also benchmark prompt-based approaches using state-of-the-art LLMs such as GPT-4o, Claude-3.7, and Gemini 2.5 Pro. Experimental results show that the SSA-COMET model significantly outperforms AfriCOMET and is competitive with Gemini 2.5 Pro, particularly for low-resource languages such as Twi, Luo, and Yoruba. All resources used in this study are released under an open license.

Takeaways, Limitations

•

Takeaways:

◦

Contributed to African language MT evaluation research by building a large-scale human-annotated dataset (SSA-MTE).

◦

Development of improved evaluation metrics such as SSA-COMET and SSA-COMET-QE.

◦

Performance benchmarking of LLMs such as GPT-4o, Claude-3.7, and Gemini 2.5 Pro, and comparative analysis with SSA-COMET.

◦

Demonstrating the superior performance of SSA-COMET in low-resource languages such as Twi, Luo, and Yoruba.

◦

Contributing to research activation by providing open licenses for research results.

•

Limitations:

◦

Data limited to the news domain.

◦

The LLM-based approach still isn't the best performing (compared to the Gemini 2.5 Pro).

◦

Dependency on a specific LLM.

◦

More language pairs and domain extensions are needed in the future.

View PDF

Made with Slashpage