Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation

Thought Anchors: Which LLM Reasoning Steps Matter?

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective

OmniGen2: Exploration to Advanced Multimodal Generation

Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning

Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models

Quantum-Classical Hybrid Quantized Neural Network

Non-equilibrium Annealed Adjoint Sampler

PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding

Mapping the Evolution of Research Contributions using KnoVo

MS-TVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Dynamic Convolution

No Free Lunch: Rethinking Internal Feedback for LLM Reasoning

TabArena: A Living Benchmark for Machine Learning on Tabular Data

VRAIL: Vectorized Reward-based Attribution for Interpretable Learning

CLAIM: Clinically-Guided LGE Augmentation for Realistic and Diverse Myocardial Scar Synthesis and Segmentation

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

IKDiffuser: A Generative Inverse Kinematics Solver for Multi-arm Robots via Diffusion Model

Fine-Grained Perturbation Guidance via Attention Head Selection

Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

C3S3: Complementary Competition and Contrastive Selection for Semi-Supervised Medical Image Segmentation

SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications

Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift?

CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models

AIDRIN 2.0: A Framework to Assess Data Readiness for AI

TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis

Teacher Motion Priors: Enhancing Robot Locomotion over Challenging Terrain

WoundAmbit: Bridging State-of-the-Art Semantic Segmentation and Real-World Wound Care

Computation Mechanism Behind LLM Position Generalization

Training Plug-n-Play Knowledge Modules with Deep Context Distillation

MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel

From $\mathcal{O}(n^{2})$ to $\mathcal{O}(n)$ Parameters: Quantum Self-Attention in Vision Transformers for Biomedical Image Classification

Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation

FGS-SLAM: Fourier-based Gaussian Splatting for Real-time SLAM with Sparse and Dense Map Fusion

Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners

Protein Structure Tokenization: Benchmarking and New Recipe

Chemical knowledge-informed framework for privacy-aware retrosynthesis learning

Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning

Diffusion Models Through a Global Lens: Are They Culturally Inclusive?

WyckoffDiff -- A Generative Diffusion Model for Crystal Symmetry

Solving Linear-Gaussian Bayesian Inverse Problems with Decoupled Diffusion Sequential Monte Carlo

Adversarial Reasoning at Jailbreaking Time

AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

Rethinking Early Stopping: Refine, Then Calibrate

Unlocking In-Context Learning for Natural Datasets Beyond Language Modelling

Towards Backdoor Stealthiness in Model Parameter Space

Distributed satellite information networks: Architecture, enabling technologies, and trends

Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair

Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains

Understanding World or Predicting Future? A Comprehensive Survey of World Models

USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting

Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

Toddlers' Active Gaze Behavior Supports Self-Supervised Object Learning

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models

Evaluating Long Range Dependency Handling in Code Generation LLMs

Physics-informed Imitative Reinforcement Learning for Real-world Driving

COBRA-PPM: A Causal Bayesian Reasoning Architecture Using Probabilistic Programming for Robot Manipulation Under Uncertainty

FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation

Do Concept Bottleneck Models Respect Localities?

When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour

Low-light Pedestrian Detection in Visible and Infrared Image Feeds: Issues and Challenges

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models

Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

The Alignment Trap: Complexity Barriers

The State of Large Language Models for African Languages: Progress and Challenges

Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation

Turing Test 2.0: The General Intelligence Threshold

$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking

RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Towards Better Benchmark Datasets for Inductive Knowledge Graph Completion

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

Disentangled representations of microscopy images

Define-ML: An Approach to Ideate Machine Learning-Enabled Systems

Weighted Mean Frequencies: a handcraft Fourier feature for 4D Flow MRI segmentation

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings

AI in the Writing Process: How Purposeful AI Support Fosters Student Writing

Dense Video Captioning using Graph-based Sentence Summarization

Causal Representation Learning with Observational Grouping for CXR Classification

Vulnerability Disclosure through Adaptive Black-Box Adversarial Attacks on NIDS

Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization

DeepQuark: deep-neural-network approach to multiquark bound states

Large Language Model-Driven Code Compliance Checking in Building Information Modeling

Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks

When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs

WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads

Industrial Energy Disaggregation with Digital Twin-generated Dataset and Efficient Data Augmentation

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

ReCode: Updating Code API Knowledge with Reinforcement Learning

Counterfactual Influence as a Distributional Quantity

Automatic Demonstration Selection for LLM-based Tabular Data Classification

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

Off-Policy Evaluation and Learning for the Future under Non-Stationarity

SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models

Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning

CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition

ReCode: Updating Code API Knowledge with Reinforcement Learning

Created by

Haebom

저자

Haoze Wu, Yunzhi Yao, Wenhao Yu, Huajun Chen, Ningyu Zhang

개요

본 논문은 대규모 언어 모델(LLM)의 코드 생성 능력이 외부 라이브러리 API의 빈번한 업데이트에 적응하지 못하는 한계를 해결하기 위해 ReCode 프레임워크를 제안합니다. ReCode는 인간 프로그래머의 API 변경 적응 방식을 모방하여, 약 2,000개의 데이터를 이용해 LLM이 버전 마이그레이션을 수행하도록 학습시키고, 수정된 문자열 유사도 측정법을 강화 학습의 보상으로 사용합니다. 실험 결과, ReCode는 특히 미지의 CodeUpdateArena 작업에서 LLM의 코드 생성 성능을 크게 향상시키며, 지도 학습 파인튜닝에 비해 일반적인 코드 생성 능력에 미치는 영향이 적다는 것을 보여줍니다. 다양한 LLM과 강화 학습 알고리즘(GRPO 및 DAPO)에 ReCode를 적용하여 일관된 성능 향상을 달성했으며, Qwen2.5-Coder-7B는 32B 매개변수 코드 지시 튜닝 모델 및 동일한 아키텍처의 추론 모델보다 우수한 성능을 보였습니다. 소스 코드는 깃허브에서 공개됩니다.

시사점, 한계점

•

시사점:

◦

LLM의 API 업데이트 적응 문제 해결을 위한 효과적인 프레임워크(ReCode) 제시

◦

강화 학습 기반 접근 방식을 통해 LLM의 코드 생성 성능 향상

◦

지도 학습 파인튜닝 대비 일반 코드 생성 능력 저하 최소화

◦

다양한 LLM 및 강화 학습 알고리즘에서 일관된 성능 향상 확인

◦

상대적으로 작은 모델(Qwen2.5-Coder-7B)이 대규모 모델을 능가하는 성능 달성

•

한계점:

◦

ReCode의 성능 향상이 특정 데이터셋(CodeUpdateArena)에 얼마나 일반화될 수 있는지 추가적인 연구 필요

◦

2,000개의 데이터셋 규모가 충분한지에 대한 검토 필요. 더 큰 규모의 데이터셋을 사용했을 때의 성능 변화 분석 필요

◦

다양한 API 및 프로그래밍 언어에 대한 일반화 가능성에 대한 추가적인 실험 필요

Made with Slashpage