Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

Cut2Next: Generating Next Shot via In-Context Tuning

DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

Chimera: Harnessing Multi-Agent LLMs for Automatic Insider Threat Simulation

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree

AMFT: Aligning LLM Reasoners by Meta-Learning the Optimal Imitation-Exploration Balance

LSDT: LLM-Augmented Semantic Digital Twins for Adaptive Knowledge-Intensive Infrastructure Planning

Do Biased Models Have Biased Thoughts?

Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Record

LLM Unlearning Without an Expert Curated Dataset

Multi-Faceted Large Embedding Tables for Pinterest Ads Ranking

Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms

Situated Epistemic Infrastructures: A Diagnostic Framework for Post-Coherence Knowledge

RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory

Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference

GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy

A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models

Explaining Time Series Classifiers with PHAR: Rule Extraction and Fusion from Post-hoc Attributions

Role-Aware Language Models for Secure and Contextualized Access Control in Organizations

DynaSwarm: Dynamically Graph Structure Selection for LLM ベースのマルチエージェントシステム

Post-Completion Learning for Language Models

Alternates, Assemble! Selecting Optimal Alternates for Citizens' Assemblies

Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?

RAGtifier: Evaluating RAG Generation Approaches of State-of-the-Art RAG Systems for the SIGIR LiveRAG Competition

Unsupervised Document and Template Clustering using Multimodal Embeddings

Saturation Self-Organizing Map

CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics

To Judge or not to Judge: Using LLM Judgements for Advertiser Keyphrase Relevance at eBay

Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey

Mj\"olnir: A Deep Learning Parametrization Framework for Global Lightning Flash Density

Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence

Democracy of AI Numerical Weather Models: An Example of Global Forecasting with FourCastNetv2 Made by a University Research Lab Using GPU

Retrieval-Augmented Generation with Conflicting Evidence

SPIE: Semantic and Structural Post-Training of Image Editing Diffusion Models with AI フィードバック

Evaluating Trust in AI, Human, and Co-produced Feedback Among Undergraduate Students

ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning

ChatBench: From Static Benchmarks to Human-AI Evaluation

Adaptive Computation Pruning for the Forgetting Transformer

AI-induced sexual harassment: Investigating Contextual Characteristics and User Reactions of Sexual Harassment by a Companion Chatbot

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

Opioid Named Entity Recognition (ONER-2025) から Reddit

OSMa-Bench: Evaluating Open Semantic Mapping Under Varying Lighting Conditions

TIDE: Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation

Flexible Prefrontal Control over Hippocampal Episodic Memory for Goal-Directed Generalization

EvoP: Robust LLM Inference via Evolutionary Pruning

Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions

Zero-shot Emotion Annotation in Facial Images Using Large Multimodal Models: Benchmarking and Prospects for Multi-Class, Multi-Frame Approaches

PAR-AdvGAN: Improving Adversarial Attack Capability with Progressive Auto-Regression AdvGAN

Forget the Data and Fine-Tuning! Just Fold the Network to Compress

FBFL: A Field-Based Coordination Approach for Data Heterogeneity in Federated Learning

Decoding-based Regression

AdEval: Alignment-based Dynamic Evaluation to Mitigate Data Contamination in Large Language Models

Chemist-aligned retrosynthesis by ensembling diverse inductive bias models

Adaptive Informed Deep Neural Networks for Power Flow Analysis

A Risk Taxonomy and Reflection Tool for Large Language Model Adoption in Public Health

Learning Marmoset Vocal Patterns with a Masked Autoencoder for Robust Call Segmentation, Classification, and Caller Identification

Dynamic Spectrum Access for Ambient Backscatter Communication-assisted D2D Systems with Quantum Reinforcement Learning

Zero-Shot Generalization of Vision-Based RL Without Data Augmentation

Hypergraph-based Motion Generation with Multi-modal Interaction Relational Reasoning

3DFacePolicy: Audio-Driven 3D Facial Animation Based on Action Control

Return Prediction for Mean-Variance Portfolio Selection: How Decision-Focused Learning Shapes Forecasting Models

OE3DIS: Open-Ended 3D Point Cloud Instance Segmentation

VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge

DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion

MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

Multidimensional Adaptive Coefficient for Inference Trajectory Optimization in Flow and Diffusion

AIOS: LLM Agent Operating System

Keep Your Friends Close: Leveraging Affinity Groups to Accelerate AI Inference Workflows

From Lab to Field: Real-World Evaluation of an AI-Driven Smart Video Solution to Enhance Community Safety

BELLA: Black box model Explanations by Local Linear Approximations

Artificial Intelligence Software Structured to Simulate Human Working Memory, Mental Imagery, and Mental Continuity

Fitting Description Logic Ontologies to ABox and Query Examples

Interpreting Fedspeak with Confidence: A LLM-Based Uncertainty-Aware Framework Guided by Monetary Policy Transmission Paths

Designing a Feedback-Driven Decision Support System for Dynamic Student Intervention

Large Language Models Do Not Simulate Human Psychology

IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model

InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Trainable Dynamic Mask Sparse Attention

Edge-Based Multimodal Sensor Data Fusion with Vision Language Models (VLMs) for Real-time Autonomous Vehicle Accident Avoidance

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Probabilistic Active Goal Recognition

When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning

Effort-aware Fairness: Incorporating a Philosophy-informed, Human-centered Notion of Effort into Algorithmic Fairness Metrics

UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI

System~2 Reasoning for Human--AI Alignment: Generality and Adaptivity via ARC-AGI

Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models

Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer

Towards Universal Neural Inference

SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system

Dynamic Uncertainty-aware Multimodal Fusion for Outdoor Health Monitoring

Can We Trust AI to Govern AI? Benchmarking LLM Performance on Privacy and AI Governance Exams

Spatial Traces: Enhancing VLA Models with Spatial-Temporal Understanding

E3-Rewrite: Learning to Rewrite SQL for Executability, Equivalence,and Efficiency

When Deepfakes Look Real: Detecting AI-Generated Faces with Unlabeled Data due to Annotation Challenges

Attacks and Defenses Against LLM Fingerprinting

LyS at SemEval 2025 Task 8: Zero-Shot Code Generation for Tabular QA

Retrospective Sparse Attention for Efficient Long-Context Generation

Rational Inverse Reasoning

Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models

Created by

Haebom

作者

Wen Wang, Bozhen Fang, Chenchen Jing, Yongliang Shen, Yangyi Shen, Qiuyu Wang, Hao Ouyang, Hao Chen, Chunhua Shen

概要

広範囲の大規模言語モデル（DLLM）は繰り返しノイズ除去を介してテキストを生成しますが、現在のデコード戦略は最終出力のための豊富な中間予測を破棄します。この研究は、正しい答えが中間過程で現れ、その後ノイズ除去段階で上書きされる時間的振動現象を明らかにします。この問題を解決するために、時間的一貫性を利用する2つの相互補完的な方法を提示します。まず、訓練を必要としないテスト時間復号化戦略である時間的自己整合性投票は、ノイズ除去段階での予測を集計して最も一貫した出力を選択します。第二に、中間予測における意味的安定性を測定する時間的意味エントロピー（TSE）を補償信号として使用して安定した生成を促進する事後訓練方法である時間的一貫性強化（Temporal Consistency Reinforcement）です。複数のベンチマークの実験結果は、提案された方法の効果を示しています。負のTSE補償のみを使用しても、従来のdLLMよりもCountdownデータセットで平均24.7％の驚くべきパフォーマンス向上が観察されました。精度補償と組み合わせて、GSM8Kで2.0％、MATH500で4.3％、SVAMPで6.6％、Countdownで25.3％の絶対性能向上を達成しました。この研究は、dLLMの時間的ダイナミクスの未使用の可能性を強調し、それを活用するための2つのシンプルで効果的なツールを提供します。

Takeaways、Limitations

•

Takeaways：

◦

DLLMの中間生成過程で発生する時間的振動現象を解明し、これを改善する2つの効果的な方法（時間的自己一貫性投票、時間的一貫性強化）を提示する。

◦

時間的一貫性を利用してDLLMの性能を大幅に改善できることを実験的に証明した。（GSM8K、MATH500、SVAMP、Countdownデータセットの大幅なパフォーマンス向上）。

◦

DLLMの時間的ダイナミクスに関する新たな理解と活用方案を提示することにより、今後のdLLMの研究開発に重要なTakeawaysを提供する。

•

Limitations：

◦

提案された方法の効果は、特定のデータセットとモデルに限定される可能性があります。さまざまなデータセットとモデルの追加実験が必要です。

◦

時間的意味エントロピー（TSE）の定義と計算方法の詳細な説明は不足しています。 TSEの一般化の可能性と限界のさらなる分析が必要です。

◦

時間的自己一貫性投票と時間的一貫性強化方法の計算の複雑さの分析が不足している。実際の適用における効率性の追加の考慮が必要である。

Made with Slashpage