Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation

Thought Anchors: Which LLM Reasoning Steps Matter?

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective

OmniGen2: Exploration to Advanced Multimodal Generation

Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning

Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models

Quantum-Classical Hybrid Quantized Neural Network

Non-equilibrium Annealed Adjoint Sampler

PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding

Mapping the Evolution of Research Contributions using KnoVo

MS-TVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Dynamic Convolution

No Free Lunch: Rethinking Internal Feedback for LLM Reasoning

TabArena: A Living Benchmark for Machine Learning on Tabular Data

VRAIL: Vectorized Reward-based Attribution for Interpretable Learning

CLAIM: Clinically-Guided LGE Augmentation for Realistic and Diverse Myocardial Scar Synthesis and Segmentation

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

IKDiffuser: A Generative Inverse Kinematics Solver for Multi-arm Robots via Diffusion Model

Fine-Grained Perturbation Guidance via Attention Head Selection

Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

C3S3: Complementary Competition and Contrastive Selection for Semi-Supervised Medical Image Segmentation

SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications

Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift?

CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models

AIDRIN 2.0: A Framework to Assess Data Readiness for AI

TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis

Teacher Motion Priors: Enhancing Robot Locomotion over Challenging Terrain

WoundAmbit: Bridging State-of-the-Art Semantic Segmentation and Real-World Wound Care

Computation Mechanism Behind LLM Position Generalization

Training Plug-n-Play Knowledge Modules with Deep Context Distillation

MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel

From $\mathcal{O}(n^{2})$ to $\mathcal{O}(n)$ Parameters: Quantum Self-Attention in Vision Transformers for Biomedical Image Classification

Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation

FGS-SLAM: Fourier-based Gaussian Splatting for Real-time SLAM with Sparse and Dense Map Fusion

Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners

Protein Structure Tokenization: Benchmarking and New Recipe

Chemical knowledge-informed framework for privacy-aware retrosynthesis learning

Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning

Diffusion Models Through a Global Lens: Are They Culturally Inclusive?

WyckoffDiff -- A Generative Diffusion Model for Crystal Symmetry

Solving Linear-Gaussian Bayesian Inverse Problems with Decoupled Diffusion Sequential Monte Carlo

Adversarial Reasoning at Jailbreaking Time

AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

Rethinking Early Stopping: Refine, Then Calibrate

Unlocking In-Context Learning for Natural Datasets Beyond Language Modeling

Towards Backdoor Stealthiness in Model Parameter Space

Distributed satellite information networks: Architecture, enabling technologies, and trends

Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair

Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains

Understanding World or Predicting Future? A Comprehensive Survey of World Models

USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting

Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

Toddlers' Active Gaze Behavior Supports Self-Supervised Object Learning

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models

Evaluating Long Range Dependency Handling in Code Generation LLMs

Physics-informed Imitative Reinforcement Learning for Real-world Driving

COBRA-PPM: A Causal Bayesian Reasoning Architecture Using Probabilistic Programming for Robot Manipulation Under Uncertainty

FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation

Do Concept Bottleneck Models Respect Localities?

When Large Language Models contradict humans? Large Language Models' Sycophantic Behavior

Low-light Pedestrian Detection in Visible and Infrared Image Feeds: Issues and Challenges

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models

Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

The Alignment Trap: Complexity Barriers

The State of Large Language Models for African Languages: Progress and Challenges

Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation

Turing Test 2.0: The General Intelligence Threshold

$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking

RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Towards Better Benchmark Datasets for Inductive Knowledge Graph Completion

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

Disentangled representations of microscopy images

Define-ML: An Approach to Ideate Machine Learning-Enabled Systems

Weighted Mean Frequencies: a handcraft Fourier feature for 4D Flow MRI segmentation

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings

AI in the Writing Process: How Purposeful AI Supports Fosters Student Writing

Dense Video Captioning using Graph-based Sentence Summarization

Causal Representation Learning with Observation Grouping for CXR Classification

Vulnerability Disclosure through Adaptive Black-Box Adversarial Attacks on NIDS

Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization

DeepQuark: deep-neural-network approach to multiquark bound states

Large Language Model-Driven Code Compliance Checking in Building Information Modeling

Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks

When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs

WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads

Industrial Energy Disaggregation with Digital Twin-generated Dataset and Efficient Data Augmentation

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

ReCode: Updating Code API Knowledge with Reinforcement Learning

Counterfactual Influence as a Distributional Quantity

Automatic Demonstration Selection for LLM-based Tabular Data Classification

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

Off-Policy Evaluation and Learning for the Future under Non-Stationarity

SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models

Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning

CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition

ReCode: Updating Code API Knowledge with Reinforcement Learning

Created by

Haebom

Author

Haoze Wu, Yunzhi Yao, Wenhao Yu, Huajun Chen, Ningyu Zhang

Outline

In this paper, we propose the ReCode framework to address the limitation of the code generation capability of large-scale language models (LLMs) that cannot adapt to frequent updates of external library APIs. ReCode imitates the way human programmers adapt to API changes, trains LLMs to perform version migration using about 2,000 data, and uses a modified string similarity measure as a reward for reinforcement learning. Experimental results show that ReCode significantly improves the code generation performance of LLMs, especially on the unknown CodeUpdateArena task, and has less impact on the general code generation capability than supervised learning fine-tuning. We apply ReCode to various LLMs and reinforcement learning algorithms (GRPO and DAPO) to achieve consistent performance improvements, and Qwen2.5-Coder-7B outperforms the 32B-parameter code-directed tuning model and the inference model with the same architecture. The source code is available on GitHub.

Takeaways, Limitations

•

Takeaways:

◦

Presenting an effective framework (ReCode) for solving the problem of adapting LLM's API updates

◦

Improving Code Generation Performance of LLM with a Reinforcement Learning-Based Approach

◦

Minimize degradation of general code generation ability compared to fine-tuning supervised learning

◦

Consistent performance improvements observed across a variety of LLM and reinforcement learning algorithms

◦

A relatively small model (Qwen2.5-Coder-7B) achieves performance that outperforms the large model

•

Limitations:

◦

Further research is needed to determine how ReCode's performance improvements generalize to a specific dataset (CodeUpdateArena).

◦

Need to review whether the dataset size of 2,000 is sufficient. Need to analyze the performance change when using a larger dataset.

◦

Further experiments are needed to determine generalizability across different APIs and programming languages.

View PDF

Made with Slashpage