/
/
Daily Arxiv
Daily Arxiv
전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.
Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving
Mechanistic Interpretability as Statistical Estimation: A Variance Analysis of EAP-IG
Neural Diffusion Processes for Physically Interpretable Survival Prediction
Tenyidie Syllabification corpus creation and deep learning applications
On Predictability of Reinforcement Learning Dynamics for Large Language Models
EMR-AGENT: Automating Cohort and Feature Extraction from EMR Databases
MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Normal-Abnormal Guided Generalist Anomaly Detection
Does Bigger Mean Better? Comparitive Analysis of CNNs and Biomedical Vision Language Modles in Medical Diagnosis
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models
The AI Productivity Index (APEX)
Discontinuous Epitope Fragments as Sufficient Target Templates for Efficient Binder Design
Uncertainty-Aware Generative Oversampling Using an Entropy-Guided Conditional Variational Autoencoder
GeoSQL-Eval: First Evaluation of LLMs on PostGIS-Based NL2GeoSQL Queries
Segmentor-Guided Counterfactual Fine-Tuning for Locally Coherent and Targeted Image Synthesis
Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation
Euclid's Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks
The Hidden Costs of Translation Accuracy: Distillation, Quantization, and Environmental Impact
IndexNet: Timestamp and Variable-Aware Modeling for Time Series Forecasting
An effective control of large systems of active particles: An application to evacuation problem
Discovering Software Parallelization Points Using Deep Neural Networks
SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
Machines are more productive than humans until they aren't, and vice versa
Landcover classification and change detection using remote sensing and machine learning: a case study of Western Fiji
Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models
MOSAIC: A Multilingual, Taxonomy-Agnostic, and Computationally Efficient Approach for Radiological Report Classification
Forecasting the Ionosphere from Sparse GNSS Data with Temporal-Fusion Transformers
Towards Methane Detection Onboard Satellites
Tackling Federated Unlearning as a Parameter Estimation Problem
Automated Model Evaluation for Object Detection via Prediction Consistency and Reliability
Legal Knowledge Graph Foundations, Part I: URI-Addressable Abstract Works (LRMoo F1 to schema.org)
An Architecture for Spatial Networking
AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?
VITA: Vision-to-Action Flow Matching Policy
Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection
Model Parallelism With Subnetwork Data Parallelism
A Novel Approach for Estimating Largest Lyapunov Exponents in One-Dimensional Chaotic Time Series Using Machine Learning
PlaceFM: A Training-free Geospatial Foundation Model of Places using Large-Scale Point of Interest Data
MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
MS-DFTVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Deformable Convolution
Adaptive Batch-Wise Sample Scheduling for Direct Preference Optimization
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
WWAggr: A Window Wasserstein-based Aggregation for Ensemble Change Point Detection
Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering
Localized Forest Fire Risk Prediction: A Department-Aware Approach for Operational Decision Support
CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning
Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation
Differential Information Distribution: A Bayesian Perspective on Direct Preference Optimization
Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking
Enhanced DACER Algorithm with High Diffusion Efficiency
What happens when generative AI models train recursively on each others' outputs?
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection
Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap
Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual Generation
Time-o1: Time-Series Forecasting Needs Transformed Label Alignment
MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation
ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data
AI-Powered Inverse Design of Ku-Band SIW Resonant Structures by Iterative Residual Correction Network
Feature Representation Transferring to Lightweight Models via Perception Coherence
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
FalconWing: An Ultra-Light Indoor Fixed-Wing UAV Platform for Vision-Based Autonomy
WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms
Towards Effective E-Participation of Citizens in the European Union: The Development of AskThePublic
Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models
Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier
Audio-Enhanced Vision-Language Modeling with Latent Space Broadening for High Quality Data Expansion
Knowledge-guided machine learning for county-level corn yield prediction under drought
Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4
What are You Looking at? Modality Contribution in Multimodal Medical Deep Learning
Interpretable Text Embeddings and Text Similarity Explanation: A Survey
CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
Forget Forgetting: Continual Learning in a World of Abundant Memory
Out-of-Distribution Detection using Synthetic Data Generation
Handling Heterophily in Recommender Systems with Wavelet Hypergraph Diffusion
Paper Quality Assessment based on Individual Wisdom Metrics from Open Peer Review
Diffusion Adversarial Post-Training for One-Step Video Generation
Unraveling Indirect In-Context Learning Using Influence Functions
Synergizing LLMs and Knowledge Graphs: A Novel Approach to Software Repository-Related Question Answering
VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention
Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning
Reasoning over User Preferences: Knowledge Graph-Augmented LLMs for Explainable Conversational Recommendations
Reliable Decision Making via Calibration Oriented Retrieval Augmented Generation
Faster LLM Inference using DBMS-Inspired Preemption and Cache Replacement Policies
There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models
QSpec: Speculative Decoding with Complementary Quantization Schemes
Superficial Safety Alignment Hypothesis
AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs
R2 v2: The Pareto-compliant R2 Indicator for Better Benchmarking in Bi-objective Optimization
Hierarchical place recognition with omnidirectional images and curriculum learning-based loss functions
Neural Network Parameter-optimization of Gaussian pmDAGs
Semantic Bridges Between First Order c-Representations and Cost-Based Semantics: An Initial Perspective
Rethinking Reward Models for Multi-Domain Test-Time Scaling
Communication-Efficient and Accurate Approach for Aggregation in Federated Low-Rank Adaptation
Load more
VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention
Created by
Haebom
저자
Mingzhe Zheng, Yongqi Xu, Haojian Huang, Xuran Ma, Yexin Liu, Wenjie Shu, Yatian Pang, Feilong Tang, Qifeng Chen, Harry Yang, Ser-Nam Lim
개요
VGoT는 단일 문장으로부터 다중 샷 비디오를 자동으로 합성하는 프레임워크로, 분절된 시각적 역학과 단절된 스토리라인으로 인해 단편 클립 제작에만 강점을 보이는 기존 비디오 생성 모델의 한계를 극복하고자 한다. VGoT는 스토리텔링, 시각적 일관성, 전환 인공물 문제를 해결하기 위해 동적 스토리라인 모델링, ID 인식 교차 샷 전파, 인접 잠재 전환 메커니즘을 활용하며, 훈련 없이도 강력한 기준선을 능가한다.
시사점, 한계점
•
시사점:
◦
단일 문장 기반의 자동 다중 샷 비디오 생성.
◦
동적 스토리라인 모델링을 통한 구조화된 스토리텔링.
◦
ID 인식 교차 샷 전파를 통한 캐릭터 일관성 유지.
◦
인접 잠재 전환 메커니즘을 통한 부드러운 시각적 흐름.
◦
강력한 기준선 대비 향상된 성능 (얼굴 일관성 20.4%, 스타일 일관성 17.4%).
◦
수동 조정 요구 사항 감소 (10배 적음).
•
한계점:
◦
논문에서 구체적인 한계점은 명시되지 않음.
PDF 보기
Made with Slashpage