[공지사항]을 빙자한 안부와 근황
Show more
/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Photonic Fabric Platform for AI Accelerators
Achieving Robust Channel Estimation Neural Networks by Designed Training Data
Can Mental Imagery Improve the Thinking Capabilities of AI Systems?
Characterizing State Space Model (SSM) and SSM-Transformer Hybrid Language Model Performance with Long Context Length
PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling
A Lightweight and Robust Framework for Real-Time Colorectal Polyp Detection Using LOF-Based Preprocessing and YOLO-v11n
HMID-Net: An Exploration of Masked Image Modeling and Knowledge Distillation in Hyperbolic Space
Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning
Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model
VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis
Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration
Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning
Learning Software Bug Reports: A Systematic Literature Review
Rethinking Data Protection in the (Generative) Artificial Intelligence Era
Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting
TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving
"Before, I Asked My Mom, Now I Ask ChatGPT": Visual Privacy Management with Generative AI for Blind and Low-Vision People
QLPro: Automated Code Vulnerability Discovery via LLM and Static Code Analysis Integration
FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization
Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
Dynamic Context Tuning for Retrieval-Augmented Generation: Enhancing Multi-Turn Planning and Tool Adaptation
Specification and Evaluation of Multi-Agent LLM Systems - Prototype and Cybersecurity Applications
PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation
Draft-based Approximate Inference for LLMs
Label-semantics Aware Generative Approach for Domain-Agnostic Multilabel Classification
SemiOccam: A Robust Semi-Supervised Image Recognition Network Using Sparse Labels
Adversarial bandit optimization for approximately linear functions
Know Or Not: a library for evaluating out-of-knowledge base robustness
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
CoordField: Coordination Field for Agentic UAV Task Allocation In Low-altitude Urban Scenarios
Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation
AnyTSR: Any-Scale Thermal Super-Resolution for UAV
Enhanced Pruning Strategy for Multi-Component Neural Architectures Using Component-Aware Graph Analysis
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems
Measuring Leakage in Concept-Based Methods: An Information Theoretic Approach
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
The Dual-Route Model of Induction
Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models
SWI: Speaking with Intent in Large Language Models
A Study of LLMs' Preferences for Libraries and Programming Languages
TruthLens: Explainable DeepFake Detection for Face Manipulated and Fully Synthetic Data
Sampling Decisions
Federated Continual Instruction Tuning
Fine-Tuning Diffusion Generative Models via Rich Preference Optimization
BriLLM: Brain-inspired Large Language Model
Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective
RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability
Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning
OMNISEC: LLM-Driven Provenance-based Intrusion Detection via Retrieval-Augmented Behavior Prompting
Too Much to Trust? Measuring the Security and Cognitive Impacts of Explainability in AI-Driven SOCs
Attend or Perish: Benchmarking Attention in Algorithmic Reasoning
Can Optical Denoising Clean Sonar Images? A Benchmark and Fusion Approach
Brain Foundation Models: A Survey on Advancements in Neural Signal Processing and Brain Discovery
Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents
Detecting Benchmark Contamination Through Watermarking
MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation
Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models
Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans
MKE-Coder: Multi-Axial Knowledge with Evidence Verification in ICD Coding for Chinese EMRs
An Overall Real-Time Mechanism for Classification and Quality Evaluation of Rice
Layerwise Recall and the Geometry of Interwoven Knowledge in LLMs
Learning in Strategic Queuing Systems with Small Buffers
BARNN: A Bayesian Autoregressive and Recurrent Neural Network
HEPPO-GAE: Hardware-Efficient Proximal Policy Optimization with Generalized Advantage Estimation
CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection
A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options
A Survey on Large Language Model-Based Social Agents in Game-Theoretic Scenarios
PEMF-VTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm
Understanding the Design Decisions of Retrieval-Augmented Generation Systems
DOGR: Towards Versatile Visual Document Grounding and Referring
Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking
DualSwinUnet++: An Enhanced Swin-Unet Architecture With Dual Decoders For PTMC Segmentation
PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding
AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
Continual Learning with Neuromorphic Computing: Foundations, Methods, and Emerging Applications
FlexiTex: Enhancing Texture Generation via Visual Guidance
ASMA: An Adaptive Safety Margin Algorithm for Vision-Language Drone Navigation via Scene-Aware Control Barrier Functions
The unknotting number, hard unknot diagrams, and reinforcement learning
Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation
Enhancing Natural Language Inference Performance with Knowledge Graph for COVID-19 Automated Fact-Checking in Indonesian Language
CVPT: Cross Visual Prompt Tuning
Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models
Stimulating Imagination: Towards General-purpose "Something Something Placement"
Why Does New Knowledge Create Messy Ripple Effects in LLMs?
A Mathematical Framework and a Suite of Learning Techniques for Neural-Symbolic Systems
How to Leverage Predictive Uncertainty Estimates for Reducing Catastrophic Forgetting in Online Continual Learning
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles
Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences
Oversmoothing Alleviation in Graph Neural Networks: A Survey and Unified View
OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics
Benchmarking Mobile Device Control Agents across Diverse Configurations
Load more
DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability
Created by
Haebom
作者
Xirui Hu, Jiahao Wang, Hao Chen, Weizhan Zhang, Benqi Wang, Yikun Li, Haishun Nan
概要
この論文は、参照画像内の特定のアイデンティティを描写するパーソナライズされた人物画像の作成に焦点を当てた研究です。既存の方法は高い忠実度のアイデンティティ保存を達成したが、単一のIDシナリオに限定され、顔編集機能が不足しているという限界を指摘している。この論文では、単一のIDと複数のIDのパーソナライゼーションを高忠実度と柔軟な顔編集機能でサポートするチューニングを必要としないフレームワークであるDynamicIDを紹介します。コアイノベーションとしては、ID特徴を注入する際の基本モデルの妨害を最小限に抑え、トレーニング中にマルチIDサンプルなしでマルチIDパーソナライゼーションを達成するSemantic-Activated Attention(SAA)、顔の動きとIDの特徴を効果的に分離して再構成し、柔軟な顔編集をサポートするIdentity-Motion Reconfigurator(IMR)、そしてデータ依存性を減らす作業分離型トレーニング異なる顔画像で表現されたVariFace-10kデータセットを含みます。実験の結果、DynamicIDは、アイデンティティ忠実度、顔編集機能、マルチIDパーソナライゼーション機能の面で最先端の方法を上回ることを示しています。
Takeaways、Limitations
•
Takeaways:
◦
チューニングなしで単一およびマルチIDのパーソナライズされた画像生成を高品質で提供する新しいフレームワーク(DynamicID)の提示。
◦
Semantic-Activated Attention(SAA)とIdentity-Motion Reconfigurator(IMR)による高い顔編集性とアイデンティティ保存性を実現
◦
作業分離型トレーニングパラダイムとVariFace-10kデータセットを活用して、データ依存性の低減とパフォーマンスの向上。
◦
既存の方法のLimitationsである単一IDと低い顔編集性の問題を効果的に解決しました。
•
Limitations:
◦
VariFace-10kデータセットのサイズと多様性の追加レビューが必要です。
◦
特定の人種や性別に対する偏りの存在の可能性
◦
実際の応用における倫理的考察と悪用の可能性に関する議論の欠如
PDFを見る
Made with Slashpage