Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

GBPP: Grasp-Aware Base Placement Prediction for Robots via Two-Stage Learning

HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking

Evalet: Evaluating Large Language Models by Fragmenting Outputs into Functions

Your Compiler is Backdooring Your Model: Understanding and Exploiting Compilation Inconsistency Vulnerabilities in Deep Learning Compilers

Physics-informed neural network solves minimal surfaces in curved spacetime

A funny companion: Distinct neural responses to perceived AI- versus human-generated humor

National Running Club Database: Assessing Collegiate Club Athletes' Cross Country Race Results

Online Learning Based Efficient Resource Allocation for LoRaWAN Network

MetaLLMix : An XAI Aided LLM-Meta-learning Based Approach for Hyper-parameters Optimization

Implicit Neural Representations of Intramyocardial Motion and Strain

MVPBench: A Benchmark and Fine-Tuning Framework for Aligning Large Language Models with Diverse Human Values

MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining

AI Governance in Higher Education: A course design exploring regulatory, ethical and practical considerations

Benchmarking Gender and Political Bias in Large Language Models

BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models

TinyDef-DETR: A DETR-based Framework for Defect Detection in Transmission Lines from UAV Imagery

TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition

Spiking Neural Networks for Continuous Control via End-to-End Model-Based Learning

ICR: Iterative Clarification and Rewriting for Conversational Search

ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions

Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs

Keypoint-based Diffusion for Robotic Motion Planning on the NICOL Robot

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems

OpenWHO: A Document-Level Parallel Corpus for Health Translation in Low-Resource Languages

SIFThinker: Spatially-Aware Image Focus for Visual Reasoning

Sample-Aware Test-Time Adaptation for Medical Image-to-Image Translation

GPT-4.1 Sets the Standard in Automated Experiment Design Using Novel Python Libraries

FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning

New Kid in the Classroom: Exploring Student Perceptions of AI Coding Assistants

Analysis of Fourier Neural Operators via Effective Field Theory

FCRF: Flexible Constructivism Reflection for Long-Horizon Robotic Task Planning with Large Language Models

PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training

Memorization Sinks: Isolating Memorization during LLM Training

Clue-RAG: Towards Accurate and Cost-Efficient Graph-based RAG via Multi-Partite Graph and Query-Driven Iterative Retrieval

OGF: An Online Gradient Flow Method for Optimizing the Statistical Steady-State Time Averages of Unsteady Turbulent Flows

AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models

Towards Bio-Inspired Robotic Trajectory Planning via Self-Supervised RNN

Evaluating the Robustness of Open-Source Vision-Language Models to Domain Shift in Object Captioning

Can Generalist Vision Language Models (VLMs) Rival Specialist Medical VLMs? Benchmarking and Strategic Insights

Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-$k$

Worst-Case Symbolic Constraints Analysis and Generalisation with Large Language Models

MedEBench: Diagnosing Reliability in Text-Guided Medical Image Editing

Counterfactual Simulatability of LLM Explanations for Generation Tasks

PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims

HiLAB: A Hybrid Inverse-Design Framework

Tuning-Free LLM Can Build A Strong Recommender Under Sparse Connectivity And Knowledge Gap Via Extracting Intent

WaterFlow: Learning Fast & Robust Watermarks using Stable Diffusion

Is the Top Still Spinning? Evaluating Subjectivity in Narrative Understanding

Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Training-free Adjustable Polynomial Graph Filtering for Ultra-fast Multimodal Recommendation

Teaching Your Models to Understand Code via Focal Preference Alignment

Investigating the use of terrain-following coordinates in AI-driven precipitation forecasts

SuPreME: A Supervised Pre-training Framework for Multimodal ECG Representation Learning

Safe Learning Under Irreversible Dynamics via Asking for Help

Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection

How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild

TokenSkip: Controllable Chain-of-Thought Compression in LLMs

Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs

Pitfalls of defacing whole-head MRI: re-identification risk with diffusion models and compromised research potential

AI/ML Based Detection and Categorization of Covert Communication in IPv6 Network

Learn from Global Correlations: Enhancing Evolutionary Algorithm via Spectral GNN

Enhancing Automated Loop Invariant Generation for Complex Programs with Large Language Models

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

Convex Regularization and Convergence of Policy Gradient Flows under Safety Constraints

Adversarial Prompt Distillation for Vision-Language Models

TrojanRobot: Physical-world Backdoor Attacks Against VLM-based Robotic Manipulation

The Belief State Transformer

A Statistical Analysis of Deep Federated Learning for Intrinsically Low-dimensional Data

Responsible AI in NLP: GUS-Net Span-Level Bias Detection Dataset and Benchmark for Generalizations, Unfairness, and Stereotypes

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models

RingMo-Aerial: An Aerial Remote Sensing Foundation Model With Affine Transformation Contrastive Learning

Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement Learning

Informed Correctors for Discrete Diffusion Models

EMOE: A Framework for Out-of-distribution Uncertainty Based Rejection via Model-Agnostic Expansive Matching of Experts

Empowering Time Series Analysis with Foundation Models: A Comprehensive Survey

Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions

Co-Alignment: Rethinking Alignment as Bidirectional Human-AI Cognitive Adaptation

When Safe Unimodal Inputs Collide: Optimizing Reasoning Chains for Cross-Modal Safety in Multimodal Large Language Models

Agentic Lybic: Multi-Agent Execution System with Tiered Reasoning and Orchestration

Executable Ontologies: Synthesizing Event Semantics with Dataflow Architecture

Explaining Tournament Solutions with Minimal Supports

Neuromorphic Computing with Multi-Frequency Oscillations: A Bio-Inspired Approach to Artificial Intelligence

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

Small Language Models are the Future of Agentic AI

Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D

Random Rule Forest (RRF): Interpretable Ensembles of LLM-Generated Questions for Predicting Startup Success

Comprehend, Divide, and Conquer: Feature Subspace Exploration via Multi-Agent Hierarchical Reinforcement Learning

Robust Decision-Making Via Free Energy Minimization

CredID: Credible Multi-Bit Watermark for Large Language Models Identification

Probing LLM Hallucination from Within: Perturbation-Driven Approach via Internal Knowledge

Overcoming classic challenges for artificial neural networks by providing incentives and practice

Federated Cross-Training Learners for Robust Generalization under Data Heterogeneity

Concurrent Linguistic Error Detection (CLED): a New Methodology for Error Detection in Large Language Models

Contrastive timbre representations for musical instrument and synthesizer retrieval

HARMONIC: A Content-Centric Cognitive Robotic Architecture

RadGame: An AI-Powered Platform for Radiology Education

JANUS: A Dual-Constraint Generative Framework for Stealthy Node Injection Attacks

Teaching Your Models to Understand Code via Focal Preference Alignment

Created by

Haebom

저자

Jie Wu, Haoling Li, Xin Zhang, Jianwen Luo, Yangyu Huang, Ruihang Chu, Yujiu Yang, Scarlett Li

개요

본 논문은 기존 코드 생성 대규모 언어 모델(LLM)의 성능 향상을 위해, 단순한 성공률 비교가 아닌, 사람의 반복적인 디버깅 과정을 모방하는 새로운 선호도 정렬 프레임워크인 Target-DPO를 제안합니다. Target-DPO는 오류 영역을 명확히 식별하고, 맞춤형 DPO 알고리즘을 통해 해당 토큰을 정렬하여 더욱 세밀한 오류 수정 패턴 학습을 가능하게 합니다. 이를 위해, 코드가 반복적으로 수정되면서 오류 수정 과정이 기록된 CodeFlow 데이터셋을 새롭게 제시합니다. 실험 결과, 다양한 코드 LLM에 Target-DPO를 적용했을 때 코드 생성 성능이 크게 향상되었으며, BigCodeBench와 같은 어려운 과제에서도 성능 개선을 보였습니다. 특히, Target-DPO는 오류 발생률을 감소시키는 효과를 보였습니다. 코드, 모델 및 데이터셋은 GitHub에서 공개됩니다.

시사점, 한계점

•

시사점:

◦

사람의 디버깅 과정을 모방한 새로운 선호도 정렬 프레임워크 Target-DPO를 통해 코드 LLM의 성능을 향상시킬 수 있음을 보여줌.

◦

오류 영역을 명확히 식별하고 정렬함으로써 더욱 효과적인 오류 수정 패턴 학습이 가능함.

◦

CodeFlow 데이터셋을 통해 더욱 정교한 학습이 가능해짐.

◦

다양한 코드 생성 과제에서 성능 향상을 입증함.

◦

오류 발생률 감소 효과 확인.

•

한계점:

◦

CodeFlow 데이터셋의 규모 및 다양성에 대한 추가적인 연구가 필요할 수 있음.

◦

Target-DPO의 성능 향상이 모든 종류의 코드 생성 과제에 일반화될 수 있는지에 대한 추가적인 검증이 필요함.

◦

다른 선호도 학습 방법과의 비교 분석이 더욱 심도 있게 진행될 필요가 있음.

Made with Slashpage