Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Sekai: A Video Dataset towards World Exploration

Federated Learning for MRI-based BrainAGE: a multicenter study on post-stroke functional outcome prediction

One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution

Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework

Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis

WebXAII: an open-source web framework to study human-XAI interaction

Refining music sample identification with a self-supervised graph neural network

Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models

Essential-Web v1.0: 24T tokens of organized web data

Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models

LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

Serving Large Language Models on Huawei CloudMatrix384

Machine Learning Methods for Small Data and Upstream Bioprocessing Applications: A Comprehensive Review

Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning

Two Heads Are Better than One: Simulating Large Transformers with Small Ones

BreastDCEDL: Curating a Comprehensive DCE-MRI Dataset and developing a Transformer Implementation for Breast Cancer Treatment Response Prediction

Semantic Preprocessing for LLM-based Malware Analysis

A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models

Human-like Forgetting Curves in Deep Neural Networks

Convergent Linear Representations of Emergent Misalignment

LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment

TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy

Autonomous Computer Vision Development with Agentic AI

The Memory Paradox: Why Our Brains Need Knowledge in an Age of AI

SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks

Using Language and Road Manuals to Inform Map Reconstruction for Autonomous Driving

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

PlantBert: An Open Source Language Model for Plant Science

Info-Coevolution: An Efficient Framework for Data Model Coevolution

SDE-SQL: Enhancing Text-to-SQL Generation in Large Language Models via Self-Driven Exploration with SQL Probes

SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

GraphRAG-Bench: Challenging Domain-Specific Reasoning for Evaluating Graph Retrieval-Augmented Generation

Towards Efficient Few-shot Graph Neural Architecture Search via Partitioning Gradient Contribution

Optimizing Sensory Neurons: Nonlinear Attention Mechanisms for Accelerated Convergence in Permutation-Invariant Neural Networks for Reinforcement Learning

CryoCCD: Conditional Cycle-consistent Diffusion with Biophysical Modeling for Cryo-EM Synthesis

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement

Dynamic Risk Assessments for Offensive Cybersecurity Agents

SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation

Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets

Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression

Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks

Learning Dynamics in Continual Pre-Training for Large Language Models

Mask-PINNs: Regulating Feature Distributions in Physics-Informed Neural Networks

Assessing Tenstorrent's RISC-V MatMul Acceleration Capabilities

SPIN-ODE: Stiff Physics-Informed Neural ODE for Chemical Reaction Rate Estimation

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

DeepSelective: Interpretable Prognosis Prediction via Feature Selection and Compression in EHR Data

Boosting multi-demographic federated learning for chest radiograph analysis using general-purpose self-supervised representations

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations

PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization

TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models

Decentralized Collective World Model for Emergent Communication and Coordination

RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations

Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack

LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions

Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies

QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation

Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models

AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation

Eau De $Q$-Network: Adaptive Distillation of Neural Networks in Deep Reinforcement Learning

Hierarchical and Modular Network on Non-prehensile Manipulation in General Environments

Selective Use of Yannakakis' Algorithm to Improve Query Performance: Machine Learning to the Rescue

FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response

Batayan: A Filipino NLP benchmark for evaluating Large Language Models

From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization

ShapeLib: Designing a library of programmatic 3D shape abstractions with Large Language Models

Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning

Guaranteed prediction sets for functional surrogate models

FDLLM: A Dedicated Detector for Black-Box LLMs Fingerprinting

Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning

MonoSOWA: Scalable monocular 3D Object detector Without human Annotations

Representation Learning of Point Cloud Upsampling in Global and Local Inputs

Multi-Preference Optimization: Generalizing DPO via Set-Level Contrasts

Incivility and Rigidity: The Risks of Fine-Tuning LLMs for Political Argumentation

Song Form-aware Full-Song Text-to-Lyrics Generation with Multi-Level Granularity Syllable Count Control

Learning Multi-Branch Cooperation for Enhanced Click-Through Rate Prediction at Taobao

On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse

Web Archives Metadata Generation with GPT-4o: Challenges and Insights

Cyclic Vision-Language Manipulator: Towards Reliable and Fine-Grained Image Interpretation for Automated Report Generation

A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning

Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system

Efficient Mixture-of-Expert for Video-based Driver State and Physiological Multi-task Estimation in Conditional Autonomous Driving

ALTA: Compiler-Based Analysis of Transformers

Learning to Route LLMs with Confidence Tokens

Core Knowledge Deficits in Multi-Modal Language Models

AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension

Can Large Language Models Replace Human Subjects? A Large-Scale Replication of Scenario-Based Experiments in Psychology and Management

LogProber: Disentangling confidence from contamination in LLM responses

V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Understanding and Reducing the Class-Dependent Effects of Data Augmentation with A Two-Player Game Approach

PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval

DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data

Created by

Haebom

저자

Yuhang Zhou, Jing Zhu, Shengyi Qian, Zhuokai Zhao, Xiyao Wang, Xiaoyu Liu, Ming Li, Paiheng Xu, Wei Ai, Furong Huang

개요

본 논문은 인간 피드백 강화 학습(RLHF) 기반의 대규모 언어 모델(LLM) 정렬 연구에 관한 것이다. 특히, 그룹 상대 정책 최적화(GRPO)의 단점을 해결하기 위해 도메인 정보를 활용한 자기 일관성 정책 최적화(DISCO)를 제안한다. GRPO는 간단하고 성능이 우수하지만, 실제 데이터셋의 불균형과 다양한 도메인 분포를 고려하지 못하는 한계를 지닌다. DISCO는 도메인별 보상 조정과 어려움을 고려한 보상 조정을 통해 이러한 문제를 해결한다. 도메인별 보상 조정은 도메인 빈도수 편향을 해결하고, 어려움을 고려한 보상 조정은 자기 일관성을 이용하여 불확실한 프롬프트에 우선순위를 부여하여 학습 효율을 높인다. 다양한 LLM과 불균형 데이터셋을 이용한 실험 결과, DISCO는 기존 GRPO보다 성능이 향상되었고, 다중 도메인 정렬 벤치마크에서 최고 성능을 달성했다.

시사점, 한계점

•

시사점:

◦

불균형 데이터셋에서의 LLM 정렬 문제를 효과적으로 해결하는 새로운 방법(DISCO) 제시

◦

도메인별 및 어려움 고려 보상 조정을 통해 더욱 공정하고 효과적인 정책 학습 가능성 제시

◦

Qwen3 모델에서 기존 GRPO 대비 5% 향상된 성능 달성 및 다중 도메인 정렬 벤치마크에서 최고 성능 기록

◦

GRPO의 한계점을 명확히 지적하고 개선 방향을 제시

•

한계점:

◦

제안된 방법의 일반성 및 확장성에 대한 추가적인 연구 필요

◦

다양한 LLM과 데이터셋에 대한 추가적인 실험 필요

◦

특정 벤치마크에 대한 성능 향상이 실제 응용 분야에서의 성능 향상으로 이어질지에 대한 추가 검증 필요

Made with Slashpage