Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Accurate and scalable exchange-correlation with deep learning

AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation

Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models

Aligning Evaluation with Clinical Priorities: Calibration, Label Shift, and Error Costs

GRAM: A Generative Foundation Reward Model for Reward Generalization

VideoMAR: Autoregressive Video Generatio with Continuous Tokens

FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation

Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models

No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need!

Serving Large Language Models on Huawei CloudMatrix384

PLD: A Choice-Theoretic List-Wise Knowledge Distillation

TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy

Refactoring Codebases through Library Design

TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding

Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation

Multi-Task Reward Learning from Human Ratings

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Vision Transformers Don't Need Trained Registers

BIS Reasoning 1.0: The First Large-Scale Japanese Benchmark for Belief-Inconsistent Syllogistic Reasoning

LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles

CORA: Coalitional Rational Advantage Decomposition for Multi-Agent Policy Gradients

Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications

ChemHAS: Hierarchical Agent Stacking for Enhancing Chemistry Tools

ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

Think Twice before Adaptation: Improving Adaptability of DeepFake Detection via Online Test-Time Adaptation

Efficient Long CoT Reasoning in Small Language Models

Imagine Beyond! Distributionally Robust Auto-Encoding for State Space Coverage in Online Reinforcement Learning

MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion

J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization

Fractured Chain-of-Thought Reasoning

DreamGen: Unlocking Generalization in Robot Learning through Video World Models

UD-English-CHILDES: A Collected Resource of Gold and Silver Universal Dependencies Trees for Child Language Interactions

Position Paper: Rethinking Privacy in RL for Sequential Decision-making in the Age of LLMs

Influential Bandits: Pulling an Arm May Change the Environment

SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models

Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning

Exploring Personalized Federated Learning Architectures for Violence Detection in Surveillance Videos

A Bird Song Detector for improving bird identification through Deep Learning: a case study from Do\~nana

KANITE: Kolmogorov-Arnold Networks for ITE estimation

Beyond Propagation of Chaos: A Stochastic Algorithm for Mean Field Optimization

Resolving UnderEdit & OverEdit with Iterative & Neighbor-Assisted Model Editing

Adding Chocolate to Mint: Mitigating Metric Interference in Machine Translation

EgoBlind: Towards Egocentric Visual Assistance for the Blind

PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice

Machine Learners Should Acknowledge the Legal Implications of Large Language Models as Personal Data

Supporting the development of Machine Learning for fundamental science in a federated Cloud with the AI_INFN platform

CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions

Perspective Transition of Large Language Models for Solving Subjective Tasks

Can LLMs Ask Good Questions?

Aligning AI Research with the Needs of Clinical Coding Workflows: Eight Recommendations Based on US Data Analysis and Critical Review

SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation

Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition

Multiclass Post-Earthquake Building Assessment Integrating High-Resolution Optical and SAR Satellite Imagery, Ground Motion, and Soil Data with Transformers

REVOLVE: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Heterogeneous Relationships of Subjects and Shapelets for Semi-supervised Multivariate Series Classification

Contrast Similarity-Aware Dual-Pathway Mamba for Multivariate Time Series Node Classification

Semantic-Geometric-Physical-Driven Robot Manipulation Skill Transfer via Skill Library and Tactile Representation

LL\"aMmlein: Transparent, Compact and Competitive German-Only Language Models from Scratch

Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes

The Epochal Sawtooth Phenomenon: Unveiling Training Loss Oscillations in Adam and Other Optimizers

Pap2Pat: Benchmarking Outline-Guided Long-Text Patent Generation with Patent-Paper Pairs

Deep Graph Anomaly Detection: A Survey and New Perspectives

A Novel Perturb-ability Score to Mitigate Evasion Adversarial Attacks on Flow-Based ML-NIDS

Style-Preserving Lip Sync via Audio-Aware Style Reference

Advancing oncology with federated learning: transcending boundaries in breast, lung, and prostate cancer. A systematic review

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Informed Correctors for Discrete Diffusion Models

RadioRAG: Online Retrieval-augmented Generation for Radiology Question Answering

A Systematic Survey of Natural Language Processing for the Greek Language

Predicting the Understandability of Computational Notebooks through Code Metrics Analysis

An Effective Incorporating Heterogeneous Knowledge Curriculum Learning for Sequence Labeling

HiURE: Hierarchical Exemplar Contrastive Learning for Unsupervised Relation Extraction

The NordDRG AI Benchmark for Large Language Models

From Data-Driven to Purpose-Driven Artificial Intelligence: Systems Thinking for Data-Analytic Automation of Patient Care

Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?

Entropy-based Exploration Conduction for Multi-step Reasoning

Solving Satisfiability Modulo Counting Exactly with Probabilistic Circuits

Synthesizing Composite Hierarchical Structure from Symbolic Music Corpora

Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization

Optimal Transport for Probabilistic Circuits

OM4OV: Leveraging Ontology Matching for Ontology Versioning

Behaviour Planning: A Toolkit for Diverse Planning

Spatial Context-based Self-Supervised Learning for Handwritten Text Recognition

"Generate" the Future of Work through AI: Empirical Evidence from Online Labor Markets

Dense SAE Latents Are Features, Not Bugs

Sekai: A Video Dataset towards World Exploration

Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers

AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning

Demystifying the Visual Quality Paradox in Multimodal Large Language Models

Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability

Federated Learning for MRI-based BrainAGE: a multicenter study on post-stroke functional outcome prediction

GFLC: Graph-based Fairness-aware Label Correction for Fair Classification

The Compositional Architecture of Regret in Large Language Models

LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning

VideoMAR: Autoregressive Video Generatio with Continuous Tokens

Created by

Haebom

저자

Hu Yu, Biao Gong, Hangjie Yuan, DanDan Zheng, Weilong Chai, Jingdong Chen, Kecheng Zheng, Feng Zhao

개요

본 논문에서는 연속 토큰을 사용하는 간결하고 효율적인 디코더 전용 자기회귀 이미지-비디오 모델인 VideoMAR을 제안합니다. VideoMAR은 시간적 프레임 간 및 공간적 마스크 생성을 결합하여 비디오 생성에 대한 자기회귀 모델의 잠재력을 탐구합니다. 비디오 자기회귀 모델의 기본 원칙으로 시간적 인과 관계와 공간적 양방향성을 제시하고, 마스크와 비디오 생성 통합을 위해 다음 프레임 확산 손실을 제안합니다. 긴 시퀀스 자기회귀 모델링의 높은 비용과 어려움을 해결하기 위해 시간적 단기-장기 커리큘럼 학습과 공간적 점진적 해상도 학습을 제안하고, 추론 시 점진적 온도 전략을 사용하여 누적 오차를 완화합니다. 또한, VideoMAR은 언어 모델의 여러 고유한 기능을 비디오 생성으로 복제합니다. 시간적 KV 캐시 및 공간적 병렬 생성의 동시 사용으로 인해 본질적으로 높은 효율성을 가지며, 3D 회전 임베딩을 통해 공간 및 시간적 외삽 기능을 제공합니다. VBench-I2V 벤치마크에서 VideoMAR은 이전 최첨단 모델(Cosmos I2V)을 능가하면서 훨씬 적은 매개변수(9.3%), 훈련 데이터(0.5%) 및 GPU 리소스(0.2%)를 필요로 합니다.

시사점, 한계점

•

시사점:

◦

연속 토큰 기반의 효율적인 디코더 전용 자기회귀 이미지-비디오 모델 VideoMAR 제안

◦

시간적 인과 관계와 공간적 양방향성을 고려한 비디오 생성

◦

시간적 단기-장기 커리큘럼 학습 및 공간적 점진적 해상도 학습을 통한 효율적인 학습

◦

기존 최고 성능 모델 대비 낮은 자원 소모량으로 우수한 성능 달성

◦

언어 모델의 장점을 비디오 생성에 적용

•

한계점:

◦

논문에서 구체적인 한계점이 언급되지 않음. 향후 연구를 통해 추가적인 개선 여지가 있을 수 있음.

◦

VBench-I2V 벤치마크 외 다른 벤치마크에 대한 성능 평가 부족.

◦

특정 비디오 생성 작업에 대한 일반화 성능에 대한 추가적인 분석 필요.

Made with Slashpage