Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation

Input Time Scaling

CRED-SQL: Enhancing Real-world Large Scale Database Text-to-SQL Parsing through Cluster Retrieval and Execution Description

STEM: Efficient Relative Capability Evaluation of LLMs through Structured Transition Samples

AdaRing: Towards Ultra-Light Vision-Language Adaptation via Cross-Layer Tensor Ring Decomposition

Biased AI improves human decision-making but reduces trust

MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling

ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model

MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving

ETA: Energy-based Test-time Adaptation for Depth Completion

Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration Tokens

When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs

CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

Reinitializing weights vs units for maintaining plasticity in neural networks

Each to Their Own: Exploring the Optimal Embedding in RAG

Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning

TolerantECG: A Foundation Model for Imperfect Electrocardiogram

DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning

LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization

Structure As Search: Unsupervised Permutation Learning for Combinatorial Optimization

Enhancing Temporal Sensitivity of Large Language Model for Recommendation with Counterfactual Tuning

Multi-agent Auditory Scene Analysis

MinD: Learning A Dual-System World Model for Real-Time Planning and Implicit Risk Analysis

AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale

Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting

Spore in the Wild: A Case Study of Spore.fun as an Open-Environment Evolution Experiment with Sovereign AI Agents on TEE-Secured Blockchains

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Neural Restoration of Greening Defects in Historical Autochrome Photographs Based on Purely Synthetic Data

Security Concerns for Large Language Models: A Survey

Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)

One-Layer Transformers are Provably Optimal for In-context Reasoning and Distributional Association Learning in Next-Token Prediction Tasks

FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for \"U-Tsang, Amdo and Kham Speech Dataset Generation

Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers

Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models

A Conceptual Framework for AI-based Decision Systems in Critical Infrastructures

Dominated Actions in Imperfect-Information Games

Hands-On: Segmenting Individual Signs from Continuous Sequences

PathGPT: Reframing Path Recommendation as a Natural Language Generation Task with Retrieval-Augmented Language Models

Boosting Chart-to-Code Generation in MLLM via Dual Preference-Guided Refinement

JudgeLRM: Large Reasoning Models as a Judge

Generative AI in K-12 Education: The CyberScholar Initiative

Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions

Hybrid Action Based Reinforcement Learning for Multi-Objective Compatible Autonomous Driving

Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations?

Action Engine: Automatic Workflow Generation in FaaS

The importance of visual modeling languages in generative software engineering

Identity Preserving 3D Head Stylization with Multiview Score Distillation

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models

Testing Components of the Attention Schema Theory in Artificial Neural Networks

A Little Human Data Goes A Long Way

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Social Debiasing for Fair Multi-modal LLMs

A Comprehensive Benchmark on Spectral GNNs: The Impact on Efficiency, Memory, and Effectiveness

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Enhancing Depression-Diagnosis-Oriented Chat with Psychological State Tracking

Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memory

Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning

Towards the Use of Saliency Maps for Explaining Low-Quality Electrocardiograms to End Users

Nash Convergence of Mean-Based Learning Algorithms in First-Price Auctions

TASER: Table Agents for Schema-guided Extraction and Recommendation

Modeling Relational Logic Circuits for And-Inverter Graph Convolutional Network

EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making

KIRETT: Knowledge-Graph-Based Smart Treatment Assistant for Intelligent Rescue Operations

EoH-S: Evolution of Heuristic Set using LLMs for Automated Heuristic Design

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

Benchmarking Vector, Graph and Hybrid Retrieval Augmented Generation (RAG) Pipelines for Open Radio Access Networks (ORAN)

The NordDRG AI Benchmark for Large Language Models

Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

Unsupervised Learning for Quadratic Assignment

Reference-Aligned Retrieval-Augmented Question Answering over Heterogeneous Proprietary Documents

Benchmarking graph construction by large language models for coherence-driven inference

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Graph Structure Learning with Temporal Graph Information Bottleneck for Inductive Representation Learning

$TIME[t] \subseteq SPACE[O(\sqrt{t})]$ via Tree Height Compression

Long Chain-of-Thought Reasoning Across Languages

From Passive Tool to Socio-cognitive Teammate: A Conceptual Framework for Agentic AI in Human-AI Collaborative Learning

Evaluating Retrieval-Augmented Generation vs. Long-Context Input for Clinical Reasoning over EHRs

TransLight: Image-Guided Customized Lighting Control with Generative Decoupling

DINOv3 with Test-Time Training for Medical Image Registration

MF-LPR$^2$: Multi-Frame License Plate Image Restoration and Recognition using Optical Flow

TransLLM: A Unified Multi-Task Foundation Framework for Urban Transportation via Learnable Prompting

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

Reliable generation of isomorphic physics problems using ChatGPT with prompt-chaining and tool use

Cross-Modality Controlled Molecule Generation with Diffusion Language Model

Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference

AFABench: A Generic Framework for Benchmarking Active Feature Acquisition

Emerson-Lei and Manna-Pnueli Games for LTLf+ and PPLTL+ Synthesis

Transplant Then Regenerate: A New Paradigm for Text Data Augmentation

ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine

Learning in Repeated Multi-Objective Stackelberg Games with Payoff Manipulation

Foe for Fraud: Transferable Adversarial Attacks in Credit Card Fraud Detection

ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signal

ELATE: Evolutionary Language model for Automated Time-series Engineering

OneLoc: Geo-Aware Generative Recommender Systems for Local Life Service

Can LLM Agents Solve Collaborative Tasks? A Study on Urgency-Aware Planning and Coordination

A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References

UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling

An Open-Source HW-SW Co-Development Framework Enabling Efficient Multi-Accelerator Systems

Mamba2 Meets Silence: Robust Vocal Source Separation for Sparse Regions

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

Created by

Haebom

Author

Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Hongzhang Liu, Ronghao Chen, Yangfan He, Daxin Jiang, Binxing Jiao, Chen Hu, Huacan Wang

Outline

This paper proposes SE-Agent, a self-evolution (SE) framework that effectively leverages interaction trajectories that emerge during the problem-solving process of a large-scale language model (LLM)-based agent to improve its performance. To overcome the limitations of existing methods like MCTS, which lead to suboptimal results due to interdependencies and lack of diversity, SE-Agent iteratively optimizes the inference process through three operations: modifying, recombining, and improving previous trajectories. This allows it to explore diverse solution paths, mitigate the impact of inefficient paths, and enhance performance. Experimental results using SWE-bench Verified demonstrate state-of-the-art performance, achieving up to 55% performance gains on five robust LLMs.

Takeaways, Limitations

•

Takeaways:

◦

A novel approach to optimizing the problem-solving process of LLM-based agents is presented.

◦

Addressing the interdependence and lack of diversity issues of existing MCTS Limitations.

◦

Efficient performance improvement and expanded search space through previous path reuse.

◦

Excellent performance proven in real GitHub issue resolution tasks.

◦

Expanding research and suggesting usability through open source disclosure.

•

Limitations:

◦

The effectiveness of SE-Agent may depend on the performance of the LLM used.

◦

Since these results are based on a specific domain (GitHub issue), further research is needed to determine generalizability.

◦

Further research is needed on optimization strategies for the three operations (modification, recombination, and improvement).

◦

There is a need to verify the scalability of SE-Agent for problems with very high complexity.

View PDF

Made with Slashpage