Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Preacher: Paper-to-Video Agentic System

Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis

Decentralized Weather Forecasting via Distributed Machine Learning and Blockchain-Based Model Validation

Biased AI improves human decision-making but reduces trust

Personalized Feature Translation for Expression Recognition: An Efficient Source-Free Domain Adaptation Method

A Neurosymbolic Framework for Interpretable Cognitive Attack Detection in Augmented Reality

IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection

EvaDrive: Evolutionary Adversarial Policy Optimization for End-to-End Autonomous Driving

To Theoretically Understand Transformer-Based In-Context Learning for Optimizing CSMA

ASPD: Unlocking Adaptive Serial-Parallel Decoding by Exploring Intrinsic Parallelism in LLMs

BiasGym: Fantastic LLM Biases and How to Find (and Remove) Them

Yan: Foundational Interactive Video Generation

M3-Net: A Cost-Effective Graph-Free MLP-Based Model for Traffic Prediction

LLM-Driven Adaptive 6G-Ready Wireless Body Area Networks: Survey and Framework

The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs

On Understanding the Dynamics of Model Capacity in Continual Learning

WeChat-YATT: A Simple, Scalable and Balanced RLHF Trainer

Improved Personalized Headline Generation via Denoising Fake Interests from Implicit Feedback

Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing Modalities

Echoes of Automation: The Increasing Use of LLMs in Newsmaking

SIFThinker: Spatially-Aware Image Focus for Visual Reasoning

Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle

Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction

Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference

MSC: A Marine Wildlife Video Dataset with Grounded Segmentation and Clip-Level Captioning

Self-Questioning Language Models

Exploring the Application of Visual Question Answering (VQA) for Classroom Activity Monitoring

Oranits: Mission Assignment and Task Offloading in Open RAN-based ITS using Metaheuristic and Deep Reinforcement Learning

DeepWriter: A Fact-Grounded Multimodal Writing Assistant Based On Offline Knowledge Base

Class-Proportional Coreset Selection for Difficulty-Separable Data

Warehouse Spatial Question Answering with LLM Agent

CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks

AmpLyze: A Deep Learning Model for Predicting the Hemolytic Concentration

EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision

GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Discrepancy-Aware Graph Mask Auto-Encoder

Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability

Quantitative Comparison of Fine-Tuning Techniques for Pretrained Latent Diffusion Models in the Generation of Unseen SAR Images

PromptTSS: A Prompting-Based Approach for Interactive Multi-Granularity Time Series Segmentation

15,500 Seconds: Lean UAV Classification Using EfficientNet and Lightweight Fine-Tuning

Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods

Data Pruning by Information Maximization

CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting

Security Concerns for Large Language Models: A Survey

Is Quantum Optimization Ready? An Effort Towards Neural Network Compression using Adiabatic Quantum Computing

Unraveling the iterative CHAD

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference

LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation

Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free

Adaptive Budgeted Multi-Armed Bandits for IoT with Dynamic Resource Constraints

Vision Transformers in Precision Agriculture: A Comprehensive Survey

Goal-Oriented Time-Series Forecasting: Foundation Framework Design

CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

FinSage: A Multi-aspect RAG System for Financial Filings Question Answering

GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes

Hyperflux: Pruning Reveals the Importance of Weights

ToolACE-R: Model-aware Iterative Training and Adaptive Refinement for Tool Learning

UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving

VectorFit: Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models

BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV Cache

Explainable Sentiment Analysis with DeepSeek-R1: Performance, Efficiency, and Few-Shot Learning

Continual Learning for Multiple Modalities

Advancing MAPF towards the Real World: A Scalable Multi-Agent Realistic Testbed (SMART)

LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

Boosting Cross-problem Generalization in Diffusion-Based Neural Combinatorial Solver via Inference Time Adaptation

Rhythmic sharing: A bio-inspired paradigm for zero-shot adaptive learning in neural networks

Measuring Diversity in Synthetic Datasets

Delayed Feedback Modeling with Influence Functions

Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization

Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding

Interpretable Neural ODEs for Gene Regulatory Network Discovery under Perturbations

A Lightweight Transformer with Phase-Only Cross-Attention for Illumination-Invariant Biometric Authentication

Understanding Transformer-based Vision Models through Inversion

INSIGHT: Explainable Weakly-Supervised Medical Image Analysis

Visual SLAMMOT Considering Multiple Motion Models

A Training-Free Approach for Music Style Transfer with Latent Diffusion Models

Multi-objective Optimization in CPU Design Space Exploration: Attention is All You Need

DiRW: Path-Aware Digraph Learning for Heterophily

Diversifying Policy Behaviors with Extrinsic Behavioral Curiosity

Episodic Memory Verbalization using Hierarchical Representations of Life-Long Robot Experience

Neural Networks Generalize on Low Complexity Data

Knowledge-based Consistency Testing of Large Language Models

Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning

An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach

Communication Cost Reduction for Subgraph Counting under Local Differential Privacy via Hash Functions

Mathematical Computation and Reasoning Errors by Large Language Models

OpenCUA: Open Foundations for Computer-Use Agents

Compass-Thinker-7B Technical Report

TextQuests: How Good are LLMs at Text-Based Video Games?

On the Definition of Intelligence

Beyond Accuracy: How AI Metacognitive Sensitivity improves AI-assisted Decision Making

LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory

MedRep: Medical Concept Representation for General Electronic Health Record Foundation Models

A Random-Key Optimizer for Combinatorial Optimization

Federated Cross-Training Learners for Robust Generalization under Data Heterogeneity

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval

IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection

Created by

Haebom

Author

Yanhui Li, Yunkang Cao, Chengliang Liu, Yuan Xiong, Xinghui Dong, Chao Huang

Outline

This paper proposes IAD-R1, a novel post-training framework that leverages the Vision-Language Model (VLM) to address the problem of anomaly detection in industrial settings. To address the lack of defect data, we employ a two-stage training strategy. The first stage, Perception Activation Supervised Fine-Tuning (PA-SFT), utilizes the high-quality Chain-of-Thought dataset Expert-AD to enhance anomaly detection and establish inference-answer correlations. The second stage, Structured Control Group Relative Policy Optimization (SC-GRPO), further enhances anomaly detection through a reward function. Experimental results demonstrate that IAD-R1 improves performance on seven VLMs, particularly on the DAGM dataset, achieving an average accuracy improvement of 43.3% over the baseline model. Furthermore, a 0.5B parameter model trained with IAD-R1 outperforms commercial models such as GPT-4.1 and Claude-Sonnet-4 in zero-shot settings. The code, dataset, and model weights are publicly available.

Takeaways, Limitations

•

Takeaways:

◦

We present a novel post-training framework, IAD-R1, that significantly improves VLM-based industrial anomaly detection performance.

◦

Versatility applicable to various VLM architectures and parameter sizes

◦

Achieving performance that surpasses commercial models in zero-shot settings

◦

Demonstrating the Effectiveness of Expert-AD, a High-Quality Chain-of-Thought Dataset

◦

Increase research reproducibility and scalability by making code, datasets, and model weights public.

•

Limitations:

◦

There is a possibility that the performance improvement of IAD-R1 may be biased towards a specific dataset (DAGM).

◦

Need to verify generalization performance for other industries or types of abnormalities

◦

Possible lack of detailed description of the creation process and quality of the Expert-AD dataset

◦

Additional explanation is needed regarding the design of the reward function of SC-GRPO.

View PDF

Made with Slashpage