Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Accurate and scalable exchange-correlation with deep learning

AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation

Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models

Aligning Evaluation with Clinical Priorities: Calibration, Label Shift, and Error Costs

GRAM: A Generative Foundation Reward Model for Reward Generalization

VideoMAR: Autoregressive Video Generatio with Continuous Tokens

FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation

Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models

No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need!

Serving Large Language Models on Huawei CloudMatrix384

PLD: A Choice-Theoretic List-Wise Knowledge Distillation

TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy

Refactoring Codebases through Library Design

TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding

Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

Multi-Task Reward Learning from Human Ratings

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Vision Transformers Don't Need Trained Registers

BIS Reasoning 1.0: The First Large-Scale Japanese Benchmark for Belief-Inconsistent Syllogistic Reasoning

LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles

CORA: Coalitional Rational Advantage Decomposition for Multi-Agent Policy Gradients

Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications

ChemHAS: Hierarchical Agent Stacking for Enhancing Chemistry Tools

ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

Think Twice before Adaptation: Improving Adaptability of DeepFake Detection via Online Test-Time Adaptation

Efficient Long CoT Reasoning in Small Language Models

Imagine Beyond! Distributionally Robust Auto-Encoding for State Space Coverage in Online Reinforcement Learning

MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion

J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization

Fractured Chain-of-Thought Reasoning

DreamGen: Unlocking Generalization in Robot Learning through Video World Models

UD-English-CHILDES: A Collected Resource of Gold and Silver Universal Dependencies Trees for Child Language Interactions

Position Paper: Rethinking Privacy in RL for Sequential Decision-making in the Age of LLMs

Influential Bandits: Pulling an Arm May Change the Environment

SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models

Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning

Exploring Personalized Federated Learning Architectures for Violence Detection in Surveillance Videos

A Bird Song Detector for improving bird identification through Deep Learning: a case study from Do\~nana

KANITE: Kolmogorov-Arnold Networks for ITE estimation

Beyond Propagation of Chaos: A Stochastic Algorithm for Mean Field Optimization

Resolving UnderEdit & OverEdit with Iterative & Neighbor-Assisted Model Editing

Adding Chocolate to Mint: Mitigating Metric Interference in Machine Translation

EgoBlind: Towards Egocentric Visual Assistance for the Blind

PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice

Machine Learners Should Acknowledge the Legal Implications of Large Language Models as Personal Data

Supporting the development of Machine Learning for fundamental science in a federated Cloud with the AI_INFN platform

CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions

Perspective Transition of Large Language Models for Solving Subjective Tasks

Can LLMs Ask Good Questions?

Aligning AI Research with the Needs of Clinical Coding Workflows: Eight Recommendations Based on US Data Analysis and Critical Review

SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation

Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition

Multiclass Post-Earthquake Building Assessment Integrating High-Resolution Optical and SAR Satellite Imagery, Ground Motion, and Soil Data with Transformers

REVOLVE: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Heterogeneous Relationships of Subjects and Shapelets for Semi-supervised Multivariate Series Classification

Contrast Similarity-Aware Dual-Pathway Mamba for Multivariate Time Series Node Classification

Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation

LL\"aMmlein: Transparent, Compact and Competitive German-Only Language Models from Scratch

Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes

The Epochal Sawtooth Phenomenon: Unveiling Training Loss Oscillations in Adam and Other Optimizers

Pap2Pat: Benchmarking Outline-Guided Long-Text Patent Generation with Patent-Paper Pairs

Deep Graph Anomaly Detection: A Survey and New Perspectives

A Novel Perturb-ability Score to Mitigate Evasion Adversarial Attacks on Flow-Based ML-NIDS

Style-Preserving Lip Sync via Audio-Aware Style Reference

Advancing oncology with federated learning: transcending boundaries in breast, lung, and prostate cancer. A systematic review

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Informed Correctors for Discrete Diffusion Models

RadioRAG: Online Retrieval-augmented Generation for Radiology Question Answering

A Systematic Survey of Natural Language Processing for the Greek Language

Predicting the Understandability of Computational Notebooks through Code Metrics Analysis

An Effective Incorporating Heterogeneous Knowledge Curriculum Learning for Sequence Labeling

HiURE: Hierarchical Exemplar Contrastive Learning for Unsupervised Relation Extraction

The NordDRG AI Benchmark for Large Language Models

From Data-Driven to Purpose-Driven Artificial Intelligence: Systems Thinking for Data-Analytic Automation of Patient Care

Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?

Entropy-based Exploration Conduction for Multi-step Reasoning

Solving Satisfiability Modulo Counting Exactly with Probabilistic Circuits

Synthesizing Composite Hierarchical Structure from Symbolic Music Corpora

Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization

Optimal Transport for Probabilistic Circuits

OM4OV: Leveraging Ontology Matching for Ontology Versioning

Behaviour Planning: A Toolkit for Diverse Planning

Spatial Context-based Self-Supervised Learning for Handwritten Text Recognition

"Generate" the Future of Work through AI: Empirical Evidence from Online Labor Markets

Dense SAE Latents Are Features, Not Bugs

Sekai: A Video Dataset towards World Exploration

Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers

AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Federated Learning for MRI-based BrainAGE: a multicenter study on post-stroke functional outcome prediction

GFLC: Graph-based Fairness-aware Label Correction for Fair Classification

The Compositional Architecture of Regret in Large Language Models

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

GRAM: A Generative Foundation Reward Model for Reward Generalization

Created by

Haebom

저자

Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Qiaozhi He, Murun Yang, Bei Li, Tong Xiao, Chunliang Zhang, Tongran Liu, Jingbo Zhu

개요

본 논문은 대규모 언어 모델(LLM) 정렬에서 기존의 판별적 보상 모델 대신 생성적 보상 모델을 제안합니다. 기존의 보상 모델은 인간 선호도 데이터에만 의존하는 반면, 본 논문에서는 비지도 학습과 지도 학습을 결합하여 생성적 보상 모델을 학습시킵니다. 먼저 대규모 비지도 학습으로 사전 훈련된 후, 지도 학습을 통해 미세 조정되는 이 모델은 레이블 스무딩 기법을 통해 규제된 쌍대 순위 손실을 최적화하는 것으로 나타났습니다. 이를 통해 생성 모델과 판별 모델을 동일한 훈련 목표 아래 연결하는 새로운 관점을 제시합니다. 결과적으로 생성된 기초 보상 모델은 추가적인 미세 조정이 거의 필요 없이 다양한 작업에 적용될 수 있으며, 응답 순위 지정, 인간 피드백으로부터의 강화 학습, 미세 조정을 통한 작업 적응 등 여러 작업에서 기존 모델보다 성능이 크게 향상됨을 실험을 통해 보여줍니다.

시사점, 한계점

•

시사점:

◦

대규모 언어 모델 정렬을 위한 새로운 생성적 보상 모델을 제시합니다.

◦

비지도 및 지도 학습을 결합하여 데이터 효율성을 높였습니다.

◦

레이블 스무딩을 통해 규제된 쌍대 순위 손실 최적화를 달성하였습니다.

◦

생성 모델과 판별 모델을 통합하는 새로운 관점을 제공합니다.

◦

다양한 작업에서 기존 모델보다 우수한 성능을 보입니다.

◦

추가적인 미세 조정이 거의 필요 없는 기초 보상 모델을 제공합니다.

•

한계점:

◦

본 논문에서 제시된 방법의 한계점에 대한 구체적인 언급이 없습니다. 추가적인 분석이 필요합니다.

◦

특정 데이터셋이나 작업에 대한 과적합 가능성에 대한 검토가 필요합니다.

◦

모델의 확장성 및 일반화 성능에 대한 추가적인 연구가 필요합니다.

Made with Slashpage