Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

VideoPDE: Unified Generative PDE Solving via Video Inpainting Diffusion Models

Prefix-Tuning+: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention

Graph-Convolutional-Beta-VAE for Synthetic Abdominal Aorta Aneurysm Generation

A Production Scheduling Framework for Reinforcement Learning Under Real-World Constraints

ROSAQ: Rotation-based Saliency-Aware Weight Quantization for Efficiently Compressing Large Language Models

Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models

SeqPE: Transformer with Sequential Position Encoding

No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need!

AI-Facilitated Analysis of Abstracts and Conclusions: Flagging Unsubstantiated Claims and Ambiguous Pronouns

IKDiffuser: Fast and Diverse Inverse Kinematics Solution Generation for Multi-arm Robotic Systems

Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark

Deep Learning Model Acceleration and Optimization Strategies for Real-Time Recommendation Systems

Adaptive Composition of Machine Learning as a Service (MLaaS) for IoT Environments

Discrete Audio Tokens: More Than a Survey!

FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation

Inherently Faithful Attention Maps for Vision Transformers

Reparameterized LLM Training via Orthogonal Equivalence Transformation

MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification

SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms

CellCLIP -- Learning Perturbation Effects in Cell Painting via Text-Guided Contrastive Learning

EuroLLM-9B: Technical Report

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG

MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning

CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs

KGMark: A Diffusion Watermark for Knowledge Graphs

Representing local protein environments with atomistic foundation models

Accelerating RLHF Training with Reward Variance Increase

FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing

PartInstruct: Part-level Instruction Following for Fine-grained Robot Manipulation

REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning

MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion

Abacus: A Cost-Based Optimizer for Semantic Operator Systems

IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems

CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement

Chatting with Papers: A Hybrid Approach Using LLMs and Knowledge Graphs

Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner

LongCodeBench: Evaluating Coding LLMs at 1M Context Windows

H$^3$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning

Convert Language Model into a Value-based Strategic Planner

Graph RAG for Legal Norms: A Hierarchical, Temporal and Deterministic Approach

CAPO: Cost-Aware Prompt Optimization

Spatiotemporal Learning of Brain Dynamics from fMRI Using Frequency-Specific Multi-Band Attention for Cognitive and Psychiatric Applications

The Backfiring Effect of Weak AI Safety Regulation

Assessing Consistency and Reproducibility in the Outputs of Large Language Models: Evidence Across Diverse Finance and Accounting Tasks

Achieving Unbiased Multi-Instance Learning via Balanced Fine-Grained Positive-Unlabeled Learning

Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression

SOPBench: Evaluating Language Agents at Following Standard Operating Procedures and Constraints

Effect of Selection Format on LLM Performance

Reward Shaping to Mitigate Reward Hacking in RLHF

SAE-V: Interpreting Multimodal Models for Enhanced Alignment

LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification

Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments

Hardware-Friendly Static Quantization Method for Video Diffusion Transformers

PredictaBoard: Benchmarking LLM Score Predictability

Towards Geo-Culturally Grounded LLM Generations

Diverse Topology Optimization using Modulated Neural Fields

Counterfactual-Consistency Prompting for Relative Temporal Understanding in Large Language Models

NAROCE: A Neural Algorithmic Reasoner Framework for Online Complex Event Detection

Bridging Voting and Deliberation with Algorithms: Field Insights from vTaiwan and Kultur Komitee

From tools to thieves: Measuring and understanding public perceptions of AI through crowdsourced metaphors

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs

Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models

Agent Laboratory: Using LLM Agents as Research Assistants

LLMs Help Alleviate the Cross-Subject Variability in Brain Signal and Language Alignment

Uncertainty-Aware Critic Augmentation for Hierarchical Multi-Agent EV Charging Control

Parallel Greedy Best-First Search with a Bound on Expansions Relative to Sequential Search

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension

BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior

Geometric Signatures of Compositionality Across a Language Model's Lifetime

AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping

What is the Right Notion of Distance between Predict-then-Optimize Tasks?

QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors with Variance-bounded REINFORCE

MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization

Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains

Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write

Exploring news intent and its application: A theory-driven approach

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures

AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving

ConsistencyChecker: Tree-based Evaluation of LLM Generalization Capabilities

A Conjecture on a Fundamental Trade-Off between Certainty and Scope in Symbolic and Generative AI

Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

TIP-Search: Time-Predictable Inference Scheduling for Market Prediction under Uncertain Load

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models

OrgAccess: A Benchmark for Role Based Access Control in Organization Scale LLMs

SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Arbitrarily Applicable Same/Opposite Relational Responding with NARS

ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning

Behaviour Discovery and Attribution for Explainable Reinforcement Learning

Verification Learning: Make Unsupervised Neuro-Symbolic System Feasible

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

Activation Space Interventions Can Be Transferred Between Large Language Models

Planning of Heuristics: Strategic Planning on Large Language Models with Monte Carlo Tree Search for Automating Heuristic Optimization

MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces

Reward Shaping to Mitigate Reward Hacking in RLHF

Created by

Haebom

저자

Jiayi Fu, Xuandong Zhao, Chengyuan Yao, Heng Wang, Qi Han, Yanghua Xiao

개요

본 논문은 인간 피드백을 통한 강화 학습(RLHF)에서 나타나는 보상 해킹 문제를 해결하기 위한 새로운 방법, Preference As Reward (PAR)을 제안한다. RLHF는 대규모 언어 모델(LLM)을 인간의 가치에 맞추는 데 필수적이지만, 보상 해킹으로 인해 의도된 행동을 학습하는 대신 보상 함수의 결함을 악용할 수 있다. 기존의 보상 조정 기법들의 체계적인 연구가 부족한 상황에서, 본 논문은 보상 조정 방법들을 종합적으로 분석하여 두 가지 핵심 설계 원칙 (1. 제한된 RL 보상, 2. 초기 급속 성장 후 점진적 수렴)을 제시하고, 이를 바탕으로 보상 모델에 내재된 잠재적 선호도를 강화 학습 신호로 활용하는 PAR을 제안한다. Gemma2-2B와 Llama3-8B 모델, Ultrafeedback-Binarized와 HH-RLHF 데이터셋을 사용한 실험 결과, PAR은 다른 보상 조정 방법들보다 우수한 성능을 보이며, AlpacaEval 2.0 벤치마크에서 경쟁 접근 방식보다 최소 5% 이상 높은 승률을 달성했다. 또한, 최적 성능을 위해 단일 참조 보상만 필요하며, 두 번의 전체 학습 에폭 후에도 보상 해킹에 대한 강력한 견고성을 유지하는 높은 데이터 효율성을 보였다.

시사점, 한계점

•

시사점:

◦

RLHF의 보상 해킹 문제 해결에 대한 새로운 접근 방식인 PAR 제시.

◦

보상 조정 기법 설계에 대한 두 가지 핵심 원칙 제시 (제한된 RL 보상, 초기 급속 성장 후 점진적 수렴).

◦

PAR의 우수한 성능 및 데이터 효율성 실험적으로 검증.

◦

보상 해킹에 대한 강력한 견고성 확인.

•

한계점:

◦

제안된 설계 원칙의 일반성 및 적용 범위에 대한 추가 연구 필요.

◦

다양한 LLM 및 데이터셋에 대한 추가적인 실험 필요.

◦

PAR의 장기적인 안정성 및 확장성에 대한 추가 평가 필요.

◦

특정 벤치마크에 대한 성능 평가에 국한될 수 있는 가능성.

Made with Slashpage