Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks

An Explainable AI based approach for Monitoring Animal Health

Bridging AI Innovation and Healthcare Needs: Lessons Learned from Incorporating Modern NLP at The BC Cancer Registry

Explainable Attention-Guided Stacked Graph Neural Networks for Malware Detection

Preacher: Paper-to-Video Agentic System

TimeMKG: Knowledge-Infused Causal Reasoning for Multivariate Time Series Modeling

From Explainable to Explained AI: Ideas for Falsifying and Quantifying Explanations

RL-MoE: An Image-Based Privacy Preserving Approach In Intelligent Transportation System

E3-Rewrite: Learning to Rewrite SQL for Executability, Equivalence, and Efficiency

The Roots of International Perceptions: Simulating US Attitude Changes Towards China with LLM Agents

MultiAiTutor: Child-Friendly Educational Multilingual Speech Generation Tutor with LLMs

ShoulderShot: Generating Over-the-Shoulder Dialogue Videos

A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges

On Approximate MMS Allocations on Restricted Graph Classes

Request-Only Optimization for Recommendation Systems

Exploring Superior Function Calls via Reinforcement Learning

Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS

Refine-IQA: Multi-Stage Reinforcement Finetuning for Perceptual Image Quality Assessment

StoryEnsemble: Enabling Dynamic Exploration & Iteration in the Design Process with AI and Forward-Backward Propagation

HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection

AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

A Segmented Robot Grasping Perception Neural Network for Edge AI

What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Text-to-Level Diffusion Models With Various Text Encoders for Super Mario Bros

ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction

Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs

A Closer Look at Multimodal Representation Collapse

TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation

Blending 3D Geometry and Machine Learning for Multi-View Stereopsis

Convolutional Autoencoders for Data Compression and Anomaly Detection in Small Satellite Technologies

SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models

EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot Control

Once Upon an AI: Six Scaffolds for Child-AI Interaction Design, Inspired by Disney

L3AC: Towards a Lightweight and Lossless Audio Codec

EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing

Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models

Human-AI Experience in Integrated Development Environments: A Systematic Literature Review

Uncertainty-Aware Adaptation of Large Language Models for Protein-Protein Interaction Analysis

Language-Based Bayesian Optimization Research Assistant (BORA)

Data Diversity as Implicit Regularization: How Does Diversity Shape the Weight Space of Deep Neural Networks?

SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression

Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding

Clean-Label Physical Backdoor Attacks with Data Distillation

TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation

A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems

JMA: a General Algorithm to Craft Nearly Optimal Targeted Adversarial Example

Large-Scale Multi-Robot Assembly Planning for Autonomous Manufacturing

Recent Advances in Generative AI for Healthcare Applications

PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning

DSperse: A Framework for Targeted Verification in Zero-Knowledge Machine Learning

IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model

AirTrafficGen: Configurable Air Traffic Scenario Generation with Large Language Models

CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking

Learning to Be A Doctor: Searching for Effective Medical Agent Architectures

Sketch Decompositions for Classical Planning via Deep Reinforcement Learning

Tool-Planner: Task Planning with Clusters across Multiple Tools

MetaAgents: Large Language Model Based Agents for Decision-Making on Teaming

Sophisticated Learning: A novel algorithm for active learning during model-based planning

Is ChatGPT-5 Ready for Mammogram VQA?

Controlling Multimodal LLMs via Reward-guided Decoding

Pretrained Conformers for Audio Fingerprinting and Retrieval

CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection

Visual Perception Engine: Fast and Flexible Multi-Head Inference for Robotic Vision Tasks

Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization

A Comprehensive Perspective on Explainable AI across the Machine Learning Workflow

Weighted First Order Model Counting for Two-variable Logic with Axioms on Two Relations

Towards Faithful Class-level Self-explainability in Graph Neural Networks by Subgraph Dependencies

Sim2Dust: Mastering Dynamic Waypoint Tracking on Granular Media

Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models

RMSL: Weakly-Supervised Insider Threat Detection with Robust Multi-sphere Learning

Reference Points in LLM Sentiment Analysis: The Role of Structured Context

Inside Knowledge: Graph-based Path Generation with Explainable Data Augmentation and Curriculum Learning for Visual Indoor Navigation

Informative Post-Hoc Explanations Only Exist for Simple Functions

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Open, Reproducible and Trustworthy Robot-Based Experiments with Virtual Labs and Digital-Twin-Based Execution Tracing

An Exploratory Study on Crack Detection in Concrete through Human-Robot Collaboration

Trustworthy AI Psychotherapy: Multi-Agent LLM Workflow for Counseling and Explainable Mental Disorder Diagnosis

Retrieval-augmented reasoning with lean language models

When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs

G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration

Does the Skeleton-Recall Loss Really Work?

Minimizing Surrogate Losses for Decision-Focused Learning using Differentiable Optimization

PTSM: Physiology-aware and Task-invariant Spatio-temporal Modeling for Cross-Subject EEG Decoding

ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism

Leveraging the RETFound foundation model for optic disc segmentation in retinal images

NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models

RegimeNAS: Regime-Aware Differentiable Architecture Search With Theoretical Guarantees for Financial Trading

SGSimEval: A Comprehensive Multifaceted and Similarity-Enhanced Benchmark for Automatic Survey Generation Systems

Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks

CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems

Scene Graph-Guided Proactive Replanning for Failure-Resilient Embodied Agent

ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection

LETToT: Label-Free Evaluation of Large Language Models On Tourism Using Expert Tree-of-Thought

Is General-Purpose AI Reasoning Sensitive to Data-Induced Cognitive Biases? Dynamic Benchmarking on Typical Software Engineering Dilemmas

Enhancing Supervised Composed Image Retrieval via Reasoning-Augmented Representation Engineering

Vision-Language Models display a strong gender bias

Hallucination in LLM-Based Code Generation: An Automotive Case Study

Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception

Is General-Purpose AI Reasoning Sensitive to Data-Induced Cognitive Biases? Dynamic Benchmarking on Typical Software Engineering Dilemmas

Created by

Haebom

Author

Francesco Sovrano, Gabriele Dominici, Rita Sevastjanova, Alessandra Stramiglio, Alberto Bacchelli

Outline

This paper presents the first dynamic benchmarking framework for assessing data-induced cognitive biases in general-purpose AI (GPAI) systems within a software engineering workflow. Starting with 16 handcrafted, realistic tasks (each featuring one of eight cognitive biases), we test whether bias-inducing linguistic cues unrelated to the task logic can lead GPAI systems to incorrect conclusions. We develop an on-demand augmentation pipeline that alters superficial details while preserving bias-inducing cues, thereby scaling the benchmark and ensuring realism. This pipeline ensures correctness, promotes diversity, and controls inference complexity by leveraging Prolog-based inference and LLM-as-a-judge verification. By evaluating leading GPAI systems, including GPT, LLaMA, and DeepSeek, we find a consistent tendency to rely on shallow linguistic heuristics rather than deep reasoning. All systems exhibited cognitive bias (ranging from 5.9% to 35% depending on type), and bias sensitivity increased rapidly with task complexity (up to 49%), highlighting a significant risk in real-world software engineering deployments.

Takeaways, Limitations

•

Takeaways:

◦

We present a benchmarking framework for quantitatively measuring data-induced cognitive bias issues in GPAI systems in software engineering for the first time.

◦

We demonstrate that leading GPAI systems exhibit cognitive biases, and that they become more susceptible to bias as task complexity increases.

◦

We confirm the reliance of GPAI systems on shallow linguistic heuristics and emphasize the importance of deep reasoning.

◦

Warning of the risks of cognitive bias in GPAI systems when deploying real-world software engineering.

•

Limitations:

◦

Current benchmarking frameworks focus on specific types of cognitive biases and software engineering tasks, and further research is needed to generalize them to other domains or types of biases.

◦

Accuracy assessment of on-demand augmentation pipelines relies on human evaluation, which has potential subjectivity and limitations.

◦

The types of GPAI systems evaluated are limited, and evaluation of a wider range of systems is needed.

View PDF

Made with Slashpage