/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Cut2Next: Generating Next Shot via In-Context Tuning
DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval
Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation
Chimera: Harnessing Multi-Agent LLMs for Automatic Insider Threat Simulation
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
AMFT: Aligning LLM Reasoners by Meta-Learning the Optimal Imitation-Exploration Balance
LSDT: LLM-Augmented Semantic Digital Twins for Adaptive Knowledge-Intensive Infrastructure Planning
Do Biased Models Have Biased Thoughts?
Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Record
LLM Unlearning Without an Expert Curated Dataset
Multi-Faceted Large Embedding Tables for Pinterest Ads Ranking
Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms
Situated Epistemic Infrastructures: A Diagnostic Framework for Post-Coherence Knowledge
RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory
Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference
GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy
A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models
Explaining Time Series Classifiers with PHAR: Rule Extraction and Fusion from Post-hoc Attributions
Role-Aware Language Models for Secure and Contextualized Access Control in Organizations
DynaSwarm: Dynamically Graph Structure Selection for LLM ベースのマルチエージェントシステム
Post-Completion Learning for Language Models
Alternates, Assemble! Selecting Optimal Alternates for Citizens' Assemblies
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
RAGtifier: Evaluating RAG Generation Approaches of State-of-the-Art RAG Systems for the SIGIR LiveRAG Competition
Unsupervised Document and Template Clustering using Multimodal Embeddings
Saturation Self-Organizing Map
CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics
To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey
Mj\"olnir: A Deep Learning Parametrization Framework for Global Lightning Flash Density
Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence
Democracy of AI Numerical Weather Models: An Example of Global Forecasting with FourCastNetv2 Made by a University Research Lab Using GPU
Retrieval-Augmented Generation with Conflicting Evidence
SPIE: Semantic and Structural Post-Training of Image Editing Diffusion Models with AI フィードバック
Evaluating Trust in AI, Human, and Co-produced Feedback Among Undergraduate Students
ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning
ChatBench: From Static Benchmarks to Human-AI Evaluation
Adaptive Computation Pruning for the Forgetting Transformer
AI-induced sexual harassment: Investigating Contextual Characteristics and User Reactions of Sexual Harassment by a Companion Chatbot
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
Opioid Named Entity Recognition (ONER-2025) から Reddit
OSMa-Bench: Evaluating Open Semantic Mapping Under Varying Lighting Conditions
TIDE: Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
Flexible Prefrontal Control over Hippocampal Episodic Memory for Goal-Directed Generalization
EvoP: Robust LLM Inference via Evolutionary Pruning
Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions
Zero-shot Emotion Annotation in Facial Images Using Large Multimodal Models: Benchmarking and Prospects for Multi-Class, Multi-Frame Approaches
PAR-AdvGAN: Improving Adversarial Attack Capability with Progressive Auto-Regression AdvGAN
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
FBFL: A Field-Based Coordination Approach for Data Heterogeneity in Federated Learning
Decoding-based Regression
AdEval: Alignment-based Dynamic Evaluation to Mitigate Data Contamination in Large Language Models
Chemist-aligned retrosynthesis by ensembling diverse inductive bias models
Adaptive Informed Deep Neural Networks for Power Flow Analysis
A Risk Taxonomy and Reflection Tool for Large Language Model Adoption in Public Health
Learning Marmoset Vocal Patterns with a Masked Autoencoder for Robust Call Segmentation, Classification, and Caller Identification
Dynamic Spectrum Access for Ambient Backscatter Communication-assisted D2D Systems with Quantum Reinforcement Learning
Zero-Shot Generalization of Vision-Based RL Without Data Augmentation
Hypergraph-based Motion Generation with Multi-modal Interaction Relational Reasoning
3DFacePolicy: Audio-Driven 3D Facial Animation Based on Action Control
Return Prediction for Mean-Variance Portfolio Selection: How Decision-Focused Learning Shapes Forecasting Models
OE3DIS: Open-Ended 3D Point Cloud Instance Segmentation
VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention
Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion
AIOS: LLM Agent Operating System
Keep Your Friends Close: Leveraging Affinity Groups to Accelerate AI Inference Workflows
From Lab to Field: Real-World Evaluation of an AI-Driven Smart Video Solution to Enhance Community Safety
BELLA: Black box model Explanations by Local Linear Approximations
Artificial Intelligence Software Structured to Simulate Human Working Memory, Mental Imagery, and Mental Continuity
Fitting Description Logic Ontologies to ABox and Query Examples
Interpreting Fedspeak with Confidence: A LLM-Based Uncertainty-Aware Framework Guided by Monetary Policy Transmission Paths
Designing a Feedback-Driven Decision Support System for Dynamic Student Intervention
Large Language Models Do Not Simulate Human Psychology
IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model
InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Trainable Dynamic Mask Sparse Attention
Edge-Based Multimodal Sensor Data Fusion with Vision Language Models (VLMs) for Real-time Autonomous Vehicle Accident Avoidance
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
Probabilistic Active Goal Recognition
When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning
Effort-aware Fairness: Incorporating a Philosophy-informed, Human-centered Notion of Effort into Algorithmic Fairness Metrics
UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
System~2 Reasoning for Human--AI Alignment: Generality and Adaptivity via ARC-AGI
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer
Towards Universal Neural Inference
SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system
Dynamic Uncertainty-aware Multimodal Fusion for Outdoor Health Monitoring
Can We Trust AI to Govern AI? Benchmarking LLM Performance on Privacy and AI Governance Exams
Spatial Traces: Enhancing VLA Models with Spatial-Temporal Understanding
E3-Rewrite: Learning to Rewrite SQL for Executability, Equivalence,and Efficiency
When Deepfakes Look Real: Detecting AI-Generated Faces with Unlabeled Data due to Annotation Challenges
Attacks and Defenses Against LLM Fingerprinting
LyS at SemEval 2025 Task 8: Zero-Shot Code Generation for Tabular QA
Retrospective Sparse Attention for Efficient Long-Context Generation
Rational Inverse Reasoning
Load more
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
Created by
Haebom
作者
Wen Wang, Bozhen Fang, Chenchen Jing, Yongliang Shen, Yangyi Shen, Qiuyu Wang, Hao Ouyang, Hao Chen, Chunhua Shen
概要
広範囲の大規模言語モデル(DLLM)は繰り返しノイズ除去を介してテキストを生成しますが、現在のデコード戦略は最終出力のための豊富な中間予測を破棄します。この研究は、正しい答えが中間過程で現れ、その後ノイズ除去段階で上書きされる時間的振動現象を明らかにします。この問題を解決するために、時間的一貫性を利用する2つの相互補完的な方法を提示します。まず、訓練を必要としないテスト時間復号化戦略である時間的自己整合性投票は、ノイズ除去段階での予測を集計して最も一貫した出力を選択します。第二に、中間予測における意味的安定性を測定する時間的意味エントロピー(TSE)を補償信号として使用して安定した生成を促進する事後訓練方法である時間的一貫性強化(Temporal Consistency Reinforcement)です。複数のベンチマークの実験結果は、提案された方法の効果を示しています。負のTSE補償のみを使用しても、従来のdLLMよりもCountdownデータセットで平均24.7%の驚くべきパフォーマンス向上が観察されました。精度補償と組み合わせて、GSM8Kで2.0%、MATH500で4.3%、SVAMPで6.6%、Countdownで25.3%の絶対性能向上を達成しました。この研究は、dLLMの時間的ダイナミクスの未使用の可能性を強調し、それを活用するための2つのシンプルで効果的なツールを提供します。
Takeaways、Limitations
•
Takeaways:
◦
DLLMの中間生成過程で発生する時間的振動現象を解明し、これを改善する2つの効果的な方法(時間的自己一貫性投票、時間的一貫性強化)を提示する。
◦
時間的一貫性を利用してDLLMの性能を大幅に改善できることを実験的に証明した。 (GSM8K、MATH500、SVAMP、Countdownデータセットの大幅なパフォーマンス向上)。
◦
DLLMの時間的ダイナミクスに関する新たな理解と活用方案を提示することにより、今後のdLLMの研究開発に重要なTakeawaysを提供する。
•
Limitations:
◦
提案された方法の効果は、特定のデータセットとモデルに限定される可能性があります。さまざまなデータセットとモデルの追加実験が必要です。
◦
時間的意味エントロピー(TSE)の定義と計算方法の詳細な説明は不足しています。 TSEの一般化の可能性と限界のさらなる分析が必要です。
◦
時間的自己一貫性投票と時間的一貫性強化方法の計算の複雑さの分析が不足している。実際の適用における効率性の追加の考慮が必要である。
PDFを見る
Made with Slashpage