Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks

An Explainable AI based approach for Monitoring Animal Health

Bridging AI Innovation and Healthcare Needs: Lessons Learned from Incorporating Modern NLP at The BC Cancer Registry

Explainable Attention-Guided Stacked Graph Neural Networks for Malware Detection

Preacher: Paper-to-Video Agentic System

TimeMKG: Knowledge-Infused Causal Reasoning for Multivariate Time Series Modeling

From Explainable to Explained AI: Ideas for Falsifying and Quantifying Explanations

RL-MoE: An Image-Based Privacy Preserving Approach In Intelligent Transportation System

E3-Rewrite: Learning to Rewrite SQL for Executability, Equivalence, and Efficiency

The Roots of International Perceptions: Simulating US Attitude Changes Towards China with LLM Agents

MultiAiTutor: Child-Friendly Educational Multilingual Speech Generation Tutor with LLMs

ShoulderShot: Generating Over-the-Shoulder Dialogue Videos

A Systematic Literature Review of Retrieval-Augmented Generation: Techniques, Metrics, and Challenges

On Approximate MMS Allocations on Restricted Graph Classes

Request-Only Optimization for Recommendation Systems

Exploring Superior Function Calls via Reinforcement Learning

Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS

Refine-IQA: Multi-Stage Reinforcement Finetuning for Perceptual Image Quality Assessment

StoryEnsemble: Enabling Dynamic Exploration & Iteration in the Design Process with AI and Forward-Backward Propagation

HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection

AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

A Segmented Robot Grasping Perception Neural Network for Edge AI

What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Text-to-Level Diffusion Models With Various Text Encoders for Super Mario Bros

ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction

Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs

A Closer Look at Multimodal Representation Collapse

TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation

Blending 3D Geometry and Machine Learning for Multi-View Stereopsis

Convolutional Autoencoders for Data Compression and Anomaly Detection in Small Satellite Technologies

SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models

EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot Control

Once Upon an AI: Six Scaffolds for Child-AI Interaction Design, Inspired by Disney

L3AC: Towards a Lightweight and Lossless Audio Codec

EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing

Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models

Human-AI Experience in Integrated Development Environments: A Systematic Literature Review

Uncertainty-Aware Adaptation of Large Language Models for Protein-Protein Interaction Analysis

Language-Based Bayesian Optimization Research Assistant (BORA)

Data Diversity as Implicit Regularization: How Does Diversity Shape the Weight Space of Deep Neural Networks?

SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression

Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding

Clean-Label Physical Backdoor Attacks with Data Distillation

TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation

A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems

JMA: a General Algorithm to Craft Nearly Optimal Targeted Adversarial Example

Large-Scale Multi-Robot Assembly Planning for Autonomous Manufacturing

Recent Advances in Generative AI for Healthcare Applications

PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning

DSperse: A Framework for Targeted Verification in Zero-Knowledge Machine Learning

IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model

AirTrafficGen: Configurable Air Traffic Scenario Generation with Large Language Models

CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking

Learning to Be A Doctor: Searching for Effective Medical Agent Architectures

Sketch Decompositions for Classical Planning via Deep Reinforcement Learning

Tool-Planner: Task Planning with Clusters across Multiple Tools

MetaAgents: Large Language Model Based Agents for Decision-Making on Teaming

Sophisticated Learning: A novel algorithm for active learning during model-based planning

Is ChatGPT-5 Ready for Mammogram VQA?

Controlling Multimodal LLMs via Reward-guided Decoding

Pretrained Conformers for Audio Fingerprinting and Retrieval

CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection

Visual Perception Engine: Fast and Flexible Multi-Head Inference for Robotic Vision Tasks

Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models

ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization

A Comprehensive Perspective on Explainable AI across the Machine Learning Workflow

Weighted First Order Model Counting for Two-variable Logic with Axioms on Two Relations

Towards Faithful Class-level Self-explainability in Graph Neural Networks by Subgraph Dependencies

Sim2Dust: Mastering Dynamic Waypoint Tracking on Granular Media

Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models

RMSL: Weakly-Supervised Insider Threat Detection with Robust Multi-sphere Learning

Reference Points in LLM Sentiment Analysis: The Role of Structured Context

Inside Knowledge: Graph-based Path Generation with Explainable Data Augmentation and Curriculum Learning for Visual Indoor Navigation

Informative Post-Hoc Explanations Only Exist for Simple Functions

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Open, Reproducible and Trustworthy Robot-Based Experiments with Virtual Labs and Digital-Twin-Based Execution Tracing

An Exploratory Study on Crack Detection in Concrete through Human-Robot Collaboration

Trustworthy AI Psychotherapy: Multi-Agent LLM Workflow for Counseling and Explainable Mental Disorder Diagnosis

Retrieval-augmented reasoning with lean language models

When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs

G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration

Does the Skeleton-Recall Loss Really Work?

Minimizing Surrogate Losses for Decision-Focused Learning using Differentiable Optimization

PTSM: Physiology-aware and Task-invariant Spatio-temporal Modeling for Cross-Subject EEG Decoding

ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism

Leveraging the RETFound foundation model for optic disc segmentation in retinal images

NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models

RegimeNAS: Regime-Aware Differentiable Architecture Search With Theoretical Guarantees for Financial Trading

SGSimEval: A Comprehensive Multifaceted and Similarity-Enhanced Benchmark for Automatic Survey Generation Systems

Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks

CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems

Scene Graph-Guided Proactive Replanning for Failure-Resilient Embodied Agent

ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection

LETToT: Label-Free Evaluation of Large Language Models On Tourism Using Expert Tree-of-Thought

Is General-Purpose AI Reasoning Sensitive to Data-Induced Cognitive Biases? Dynamic Benchmarking on Typical Software Engineering Dilemmas

Enhancing Supervised Composed Image Retrieval via Reasoning-Augmented Representation Engineering

Vision-Language Models display a strong gender bias

Hallucination in LLM-Based Code Generation: An Automotive Case Study

Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception

EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot Control

Created by

Haebom

Author

Hanwen Wan, Yifei Chen, Yixuan Deng, Zeyu Wei, Dongrui Li, Zexin Lin, Donghao Wu, Jiu Cheng, Xiaoqiang Ji

Outline

This paper introduces EmbodiedAgent, a hierarchical framework for heterogeneous multi-robot control. To address the hallucination problem arising from unrealistic tasks, EmbodiedAgent integrates a next-action prediction paradigm and a structured memory system to decompose tasks into executable robot actions and dynamically validate actions based on environmental constraints. Furthermore, we present the MultiPlan+ dataset, which contains over 18,000 annotated planning instances across 100 scenarios, including a subset of unrealistic cases to mitigate the hallucination problem. To evaluate performance, we propose the Robot Planning Assessment Schema (RPAS), which combines automated metrics with LLM-assisted expert evaluation. Experimental results demonstrate that EmbodiedAgent outperforms state-of-the-art models, achieving an RPAS score of 71.85%. Real-world validation on an office service task highlights the ability of EmbodiedAgent to coordinate heterogeneous robots toward long-term goals.

Takeaways, Limitations

•

Takeaways:

◦

An Effective Hierarchical Framework for Heterogeneous Multi-Robot Control

◦

A novel approach to alleviate hallucination problems (utilizing predictive and structured memory systems)

◦

Release of MultiPlan+, a large-scale dataset covering a variety of scenarios.

◦

Proposing a new evaluation standard, RPAS, combining automation and expert evaluation.

◦

Validation of practicality through performance verification in real-world tasks

•

Limitations:

◦

Further research is needed to determine the versatility and generalizability of the MultiPlan+ dataset.

◦

Further review is needed on the objectivity and reliability of RPAS evaluation criteria.

◦

Performance evaluation in more complex and diverse environments is needed.

◦

Further research is needed to address unexpected issues that may arise in real-world applications.

View PDF

Made with Slashpage