Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation

EarthLink: A Self-Evolving AI Agent for Climate Science

Reality Proxy: Fluid Interactions with Real-World Objects in MR via Abstract Representations

Leveraging multi-source and heterogeneous signals for fatigue detection

Segmentation-free Goodness of Pronunciation

Adaptive Relative Pose Estimation Framework with Dual Noise Tuning for Safe Approaching Maneuvers

Compositional Coordination for Multi-Robot Teams with Large Language Models

Diffusion Beats Autoregressive in Data-Constrained Settings

The New LLM Bottleneck: A Systems Perspective on Latent Attention and Mixture-of-Experts

EndoControlMag: Robust Endoscopic Vascular Motion Magnification with Periodic Reference Resetting and Hierarchical Tissue-aware Dual-Mask Control

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards

GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks

SDSC:A Structure-Aware Metric for Semantic Signal Representation Learning

Multilingual LLMs Are Not Multilingual Thinkers: Evidence from Hindi Analogy Evaluation

Frequency-Dynamic Attention Modulation for Dense Prediction

A Survey of Deep Learning for Geometry Problem Solving

EEG Foundation Models: A Critical Review of Current Progress and Future Directions

Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models

A PBN-RL-XAI Framework for Discovering a "Hit-and-Run" Therapeutic Strategy in Melanoma

Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks

OrQstrator: An AI-Powered Framework for Advanced Quantum Circuit Optimization

A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1

Mechanistic Indicators of Understanding in Large Language Models

Scaling RL to Long Videos

Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model

Masked Autoencoders that Feel the Heart: Unveiling Simplicity Bias for ECG Analyses

SyncMapV2: Robust and Adaptive Unsupervised Segmentation

LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs

Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation

Diffuse and Disperse: Image Generation with Representation Regularization

LLM-D12: A Dual-Dimensional Scale of Instrumental and Relational Dependencies on Large Language Models

MambaNeXt-YOLO: A Hybrid State Space Model for Real-time Object Detection

PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Machine Learning Solutions Integrated in an IoT Healthcare Platform for Heart Failure Risk Stratification

Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning

Vision Transformers in Precision Agriculture: A Comprehensive Survey

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research

LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important

Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models

Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs

Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning

Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder

When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation

Tackling Hallucination from Conditional Models for Medical Image Reconstruction with DynamicDPS

Quantum Machine Learning in Precision Medicine and Drug Discovery -- A Game Changer for Tailored Treatments?

A general language model for peptide identification

ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

LLM Alignment as Retriever Optimization: An Information Retrieval Perspective

Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications Across Lab and Field Settings

Online Housing Market

Integrated Learning and Optimization for Congestion Management and Profit Maximization in Real-Time Electricity Market

Integrating Evidence into the Design of XAI and AI-based Decision Support Systems: A Means-End Framework for End-users in Construction

Scalable Parameter Design for Superconducting Quantum Circuits with Graph Neural Networks

A Survey of Event Causality Identification: Taxonomy, Challenges, Assessment, and Prospects

Neural Corrective Machine Unranking

Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation

Differentiable Motion Manifold Primitives for Reactive Motion Generation under Kinodynamic Constraints

Zeroth-Order Fine-Tuning of LLMs in Random Subspaces

RUMI: Rummaging Using Mutual Information

Neural Machine Unranking

VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time

A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models

DualXDA: Towards Sparse, Efficient and Explainable Data Attribution in Large AI Models

Quantifying the Uniqueness and Divisiveness of Presidential Discourse

DocTER: Evaluating Document-based Knowledge Editing

Learning Concepts Definable in First-Order Logic with Counting

Recognizing and Eliciting Weakly Single Crossing Profiles on Trees

Compliance Brain Assistant: Conversational Agentic AI for Assisting Compliance Tasks in Enterprise Environments

Learning Temporal Abstractions via Variational Homomorphisms in Option-Induced Abstract MDPs

When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems

An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis

DisMS-TS: Eliminating Redundant Multi-Scale Features for Time Series Classification

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games

Beamforming and Resource Allocation for Delay Minimization in RIS-Assisted OFDM Systems

Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem

EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework

SuperARC: An Agnostic Test for Narrow, General, and Super Intelligence Based On the Principles of Recursive Compression and Algorithmic Probability

IPCGRL: Language-Instructed Reinforcement Learning for Procedural Level Generation

OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problems with Reasoning LLM

Chemical reasoning in LLMs unlocks strategy-aware synthesis planning and reaction mechanism elucidation

BEARCUBS: A benchmark for computer-using web agents

From Hypothesis to Publication: A Comprehensive Survey of AI-Driven Research Support Systems

HPS: Hard Preference Sampling for Human Preference Alignment

A Differentiated Reward Method for Reinforcement Learning based Multi-Vehicle Cooperative Decision-Making Algorithms

Retrieving Classes of Causal Orders with Inconsistent Knowledge Bases

On the Structure of Game Provenance and its Applications

I-CEE: Tailoring Explanations of Image Classification Models to User Expertise

SIDA: Synthetic Image Driven Zero-shot Domain Adaptation

3D Software Synthesis Guided by Constraint-Expressive Intermediate Representation

Moving Out: Physically-grounded Human-AI Collaboration

SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning

Approximate SMT Counting Beyond Discrete Domains

DRWKV: Focusing on Object Edges for Low-Light Image Enhancement

Yume: An Interactive World Generation Model

Created by

Haebom

저자

Xiaofeng Mao, Shaoheng Lin, Zhen Li, Chuanhao Li, Wenshuo Peng, Tong He, Jiangmiao Pang, Mingmin Chi, Yu Qiao, Kaipeng Zhang

개요

Yume는 이미지, 텍스트 또는 비디오를 사용하여 상호 작용적이고 현실적이며 동적인 세계를 생성하는 것을 목표로 하는 프로젝트입니다. 사용자는 주변 기기 또는 신경 신호를 사용하여 이 세계를 탐험하고 제어할 수 있습니다. 본 보고서에서는 입력 이미지에서 동적 세계를 생성하고 키보드 조작을 통해 세계 탐험을 가능하게 하는 Yume의 시험 버전을 소개합니다. 고품질의 상호 작용적 비디오 세계 생성을 위해 카메라 움직임 양자화, 비디오 생성 아키텍처, 고급 샘플러, 모델 가속화의 네 가지 주요 구성 요소로 이루어진 잘 설계된 프레임워크를 도입했습니다. 안정적인 훈련과 사용자 친화적인 키보드 입력을 위한 카메라 움직임 양자화, 자기회귀 방식으로 무한한 비디오 생성을 위한 메모리 모듈이 포함된 Masked Video Diffusion Transformer(MVDT), 더 나은 시각적 품질과 더 정확한 제어를 위한 훈련이 필요 없는 Anti-Artifact Mechanism(AAM)과 Stochastic Differential Equations(SDE) 기반 Time Travel Sampling(TTS-SDE), 적대적 증류와 캐싱 메커니즘의 상승적 최적화를 통한 모델 가속화 등이 주요 기술적 내용입니다. 고품질 세계 탐험 데이터셋인 Sekai를 사용하여 Yume을 훈련시켰으며, 다양한 장면과 애플리케이션에서 주목할 만한 결과를 얻었습니다. 모든 데이터, 코드베이스 및 모델 가중치는 https://github.com/stdstu12/YUME 에서 이용 가능하며, Yume은 매달 업데이트될 예정입니다.

GitHub - stdstu12/YUME

Contribute to stdstu12/YUME development by creating an account on GitHub.

시사점, 한계점

•

시사점:

◦

이미지, 텍스트, 비디오를 활용한 상호작용적이고 현실적인 가상 세계 생성 기술 제시

◦

키보드 입력을 통한 직관적인 세계 탐험 가능

◦

MVDT, AAM, TTS-SDE 등 혁신적인 기술을 통한 고품질 비디오 생성 및 정밀한 제어

◦

모델 가속화를 위한 효율적인 최적화 기법 적용

◦

오픈소스로 공개되어 연구 및 개발에 기여

•

한계점:

◦

현재 버전은 키보드 입력에만 의존하며, 주변 기기 또는 신경 신호 제어는 아직 구현되지 않음

◦

시험 버전으로, 완전한 기능 구현까지는 추가 개발이 필요함

◦

Sekai 데이터셋에 대한 자세한 설명 부족

◦

장기간 사용 시 발생할 수 있는 성능 저하 또는 안정성 문제에 대한 검증 부족

Made with Slashpage