[공지사항]을 빙자한 안부와 근황

Show more

Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models

MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

A Roadmap for Climate-Relevant Robotics Research

Fairness Is Not Enough: Auditing Competence and Intersectional Bias in AI-powered Resume Screening

MMOne: Representing Multiple Modalities in One Scene

SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks

CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance

(Almost) Free Modality Stitching of Foundation Models

A Brain Tumor Segmentation Method Based on CLIP and 3D U-Net with Cross-Modal Semantic Guidance and Multi-Level Feature Fusion

KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection

THOR: Transformer Heuristics for On-Demand Retrieval

SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems

KeyRe-ID: Keypoint-Guided Person Re-Identification using Part-Aware Representation in Videos

Prompt Perturbations Reveal Human-Like Biases in LLM Survey Responses

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model

Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling

VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents

ReCode: Updating Code API Knowledge with Reinforcement Learning

Cross-Layer Discrete Concept Discovery for Interpreting Language Models

Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

Multiple-Frequencies Population-Based Training

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows

ContextQFormer: A New Context Modeling Method for Multi-Turn Multi-Modal Conversations

GPU Performance Portability needs Autotuning

Generating Synthetic Data via Augmentations for Improved Facial Resemblance in DreamBooth and InstantID

Coral Protocol: Open Infrastructure Connecting The Internet of Agents

MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness

Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence

ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs

Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression

JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model

K-P Quantum Neural Networks

VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models

Data-Efficient Deep Operator Network for Unsteady Flow: A Multi-Fidelity Approach with Physics-Guided Subsampling

Learning Universal Human Mobility Patterns with a Foundation Model for Cross-domain Data Fusion

GeoFlow-SLAM: A Robust Tightly-Coupled RGBD-Inertial and Legged Odometry Fusion SLAM for Dynamic Legged Robotics

A Multi-Stage Framework with Taxonomy-Guided Reasoning for Occupation Classification Using Large Language Models

Multi-View Node Pruning for Accurate Graph Representation

V-Max: A Reinforcement Learning Framework for Autonomous Driving

Interpretable Transformation and Analysis of Timelines through Learning via Surprisability

AI Governance InternationaL Evaluation Index (AGILE Index) 2024

UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning

Improving Transformer World Models for Data-Efficient RL

LLM-RecG: A Semantic Bias-Aware Framework for Zero-Shot Sequential Recommendation

SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks

Determination of galaxy photometric redshifts using Conditional Generative Adversarial Networks (CGANs)

Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis

MRGen: Segmentation Data Engine for Underrepresented MRI Modalities

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy for Visuomotor Imitation Learning

Dataset resulting from the user study on comprehensibility of explainable AI algorithms

Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models

LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization

Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information

DeFine: Decision-Making with Analogical Reasoning over Factor Profiles

Benchmarking Sub-Genre Classification For Mainstage Dance Music

Risks of ignoring uncertainty propagation in AI-augmented security pipelines

MedPix 2.0: A Comprehensive Multimodal Biomedical Data set for Advanced AI Applications with Retrieval Augmented Generation and Knowledge Graphs

Leveraging Quantum Superposition to Infer the Dynamic Behavior of a Spatial-Temporal Neural Network Signaling Model

Bounding the Worst-class Error: A Boosting Approach

TBDetector:Transformer-Based Detector for Advanced Persistent Threats with Provenance Graph

Machine Learning Systems: A Survey from a Data-Oriented Perspective

Aime: Towards Fully-Autonomous Multi-Agent Framework

SmartThinker: Learning to Compress and Preserve Reasoning by Step-Level Length Control

Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments

NTRL: Encounter Generation via Reinforcement Learning for Dynamic Difficulty Adjustment in Dungeons and Dragons

Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplifications and Resistance in Multi-Agent Based LLM-as-Judge

ActionStudio: A Lightweight Framework for Data and Training of Large Action Models

BEARCUBS: A benchmark for computer-using web agents

Demystifying MuZero Planning: Interpreting the Learned Model

LLM-Enhanced User-Item Interactions: Leveraging Edge Information for Optimized Recommendations

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Imbalance in Balance: Online Concept Balancing in Generation Models

Latent Policy Steering with Embodiment-Agnostic Pretrained World Models

Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It

Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark

AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research

Towards Formal Verification of LLM-Generated Code from Natural Language Prompts

Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management

QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

Merge Kernel for Bayesian Optimization on Permutation Space

Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

Automating Steering for Safe Multimodal Large Language Models

HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models

VITA: Vision-to-Action Flow Matching Policy

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation

Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection

Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback

SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks

Prompt Injection 2.0: Hybrid AI Threats

Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models

Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities

ReCode: Updating Code API Knowledge with Reinforcement Learning

Created by

Haebom

저자

Haoze Wu, Yunzhi Yao, Wenhao Yu, Huajun Chen, Ningyu Zhang

개요

본 논문은 대규모 언어 모델(LLM)의 코드 생성 능력이 외부 라이브러리 API의 빈번한 업데이트에 적응하는 데 어려움을 겪는다는 문제점을 제기합니다. 이는 LLM이 훈련 데이터의 오래된 API 정보에 의존하기 때문입니다. 이를 해결하기 위해, 본 논문에서는 API 변경 사항에 대한 인간 프로그래머의 적응 방식을 모방하는 새로운 프레임워크인 ReCode (rule-based Reinforcement learning for Code Update)를 제안합니다. ReCode는 약 2,000개의 데이터 항목으로 구성된 데이터셋을 사용하여 LLM이 업데이트된 정보를 기반으로 버전 마이그레이션을 수행하도록 훈련합니다. 또한 강화 학습의 보상으로 수정된 문자열 유사성 측정 기준을 도입합니다. 실험 결과, ReCode는 특히 unseen CodeUpdateArena 작업에서 동적 API 시나리오에서 LLM의 코드 생성 성능을 크게 향상시키는 것으로 나타났습니다. 특히, 지도 학습 미세 조정과 비교하여 ReCode는 LLM의 일반적인 코드 생성 능력에 미치는 영향이 적습니다. 다양한 LLM과 강화 학습 알고리즘(GRPO 및 DAPO)에 ReCode를 적용하여 일관된 성능 향상을 달성했습니다. 특히, 훈련 후 Qwen2.5-Coder-7B는 동일한 아키텍처를 가진 32B 매개변수 코드 지시어 미세 조정 모델 및 추론 모델보다 성능이 우수했습니다. 코드는 https://github.com/zjunlp/ReCode 에서 확인할 수 있습니다.

GitHub - zjunlp/ReCode: ReCode: Reinforced Code Knowledge Editing for API Updates

ReCode: Reinforced Code Knowledge Editing for API Updates - zjunlp/ReCode

시사점, 한계점

•

시사점:

◦

LLM의 동적 API 환경에서의 코드 생성 성능을 향상시키는 효과적인 방법 제시

◦

강화 학습 기반의 ReCode 프레임워크가 지도 학습보다 LLM의 일반적인 코드 생성 능력에 미치는 부정적 영향이 적음

◦

다양한 LLM과 강화 학습 알고리즘에 적용 가능성 확인 및 우수한 성능 검증 (Qwen2.5-Coder-7B의 뛰어난 성능)

◦

실제 API 업데이트에 대한 적응력 향상

•

한계점:

◦

현재 2,000개의 데이터 항목으로 훈련되었다는 점에서 데이터셋 규모의 확장 필요성

◦

다양한 API 및 프로그래밍 언어에 대한 일반화 성능에 대한 추가 연구 필요

◦

실제 환경에서의 ReCode의 장기적인 안정성 및 유지보수에 대한 검토 필요

◦

사용된 문자열 유사성 측정 기준의 한계 및 개선 가능성

Made with Slashpage