Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds

Avoidance Decoding for Diverse Multi-Branch Story Generation

HydroVision: Predicting Optically Active Parameters in Surface Water Using Computer Vision

HodgeFormer: Transformers for Learnable Operators on Triangular Meshes through Data-Driven Hodge Matrices

MSA2-Net: Utilizing Self-Adaptive Convolution Module to Extract Multi-Scale Information in Medical Image Segmentation

Q-Learning-Driven Adaptive Rewiring for Cooperative Control in Heterogeneous Networks

Spotlighter: Revisiting Prompt Tuning from a Representative Mining View

Multimodal Iterative RAG for Knowledge Visual Question Answering

Embodied AI: Emerging Risks and Opportunities for Policy Action

Meta-learning ecological priors from large language models explains human learning and decision making

Scaffold Diffusion: Sparse Multi-Category Voxel Structure Generation with Discrete Diffusion

Locus: Agentic Predicate Synthesis for Directed Fuzzing

Network-Level Prompt and Trait Leakage in Local Research Agents

The Information Dynamics of Generative Diffusion

Arbitrary Precision Printed Ternary Neural Networks with Holistic Evolutionary Approximation

Murakkab: Resource-Efficient Agentic Workflow Orchestration in Cloud Platforms

LinkAnchor: An Autonomous LLM-Based Agent for Issue-to-Commit Link Recovery

MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents

STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports

BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models

Learning to Select MCP Algorithms: From Traditional ML to Dual-Channel GAT-MLP

MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning

A DbC Inspired Neurosymbolic Layer for Trustworthy Agent Design

RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems

LanternNet: A Hub-and-Spoke System to Seek and Suppress Spotted Lanternfly Populations

When and Where do Data Poisons Attack Textual Inversion?

Covering a Few Submodular Constraints and Applications

Rethinking Data Protection in the (Generative) Artificial Intelligence Era

LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling

GroundingDINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models

IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech

HERCULES: Hierarchical Embedding-based Recursive Clustering Using LLMs for Efficient Summarization

Multimodal Medical Image Binding via Shared Text Embeddings

Open-Set LiDAR Panoptic Segmentation Guided by Uncertainty-Aware Learning

Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity

LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis

A theoretical framework for self-supervised contrastive learning for continuous dependent data

Securing AI Agents with Information-Flow Control

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands

Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

When a Reinforcement Learning Agent Encounters Unknown Unknowns

Group-in-Group Policy Optimization for LLM Agent Training

Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer

LawFlow: Collecting and Simulating Lawyers' Thought Processes on Business Formation Case Studies

On Developers' Self-Declaration of AI-Generated Code: An Analysis of Practices

WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada

Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and Beyond

HDVIO2.0: Wind and Disturbance Estimation with Hybrid Dynamics VIO

TruthLens: Visual Grounding for Universal DeepFake Reasoning

Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning

Efficiently Editing Mixture-of-Experts Models with Compressed Experts

Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

Investigating a Model-Agnostic and Imputation-Free Approach for Irregularly-Sampled Multivariate Time-Series Modeling

Rapid Word Learning Through Meta In-Context Learning

FedP$^2$EFT: Federated Learning to Personalize PEFT for Multilingual LLMs

Predict, Cluster, Refine: A Joint Embedding Predictive Self-Supervised Framework for Graph Representation Learning

Survey on Hand Gesture Recognition from Visual Input

Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models

RouteNet-Gauss: Hardware-Enhanced Network Modeling with Machine Learning

GalaxAlign: Mimicking Citizen Scientists' Multimodal Guidance for Galaxy Morphology Analysis

Soft-Transformers for Continual Learning

Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

Domain Consistency Representation Learning for Lifelong Person Re-Identification

Aligning Machine and Human Visual Representations across Abstraction Levels

Towards Agentic AI on Particle Accelerators

Enhancing Natural Language Inference Performance with Knowledge Graph for COVID-19 Automated Fact-Checking in Indonesian Language

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

Banishing LLM Hallucinations Requires Rethinking Generalization

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

MF-OML: Online Mean-Field Reinforcement Learning with Occupation Measures for Large Population Games

Explainable Machine Learning-Based Security and Privacy Protection Framework for Internet of Medical Things Systems

From Metrics to Meaning: Time to Rethink Evaluation in Human-AI Collaborative Design

P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer

Towards Agentic OS: An LLM Agent Framework for Linux Schedulers

CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs

ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care

L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search

AHELM: A Holistic Evaluation of Audio-Language Models

The Ramon Llull's Thinking Machine for Automated Ideation

Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning

KIRETT: Knowledge-Graph-Based Smart Treatment Assistant for Intelligent Rescue Operations

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks

Integrating Activity Predictions in Knowledge Graphs

Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks

ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP

Deep Research Agents: A Systematic Examination And Roadmap

Gradients: When Markets Meet Fine-tuning -- A Distributed Approach to Model Optimization

ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research

Shutdownable Agents through POST-Agency

CyberBOT: Towards Reliable Cybersecurity Education via Ontology-Grounded Retrieval Augmented Generation

PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation

Can Large Language Models Act as Ensembler for Multi-GNNs?

MorphAgent: Empowering Agents through Self-Evolving Profiles and Decentralized Collaboration

Frugal inference for control

On Generating Monolithic and Model Reconciling Explanations in Probabilistic Scenarios

A Survey on Human-AI Collaboration with Large Foundation Models

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

Created by

Haebom

Author

Dong Liu, Yanxuan Yu, Jiayi Zhang, Yifan Li, Ben Lengerich, Ying Nian Wu

Outline

This paper proposes the FastCache framework to reduce the computational cost of Diffusion Transformers (DiTs). FastCache employs a dual strategy to accelerate inference by exploiting redundancy in the model's internal representation. First, it employs a spatially aware token selection mechanism that adaptively filters out redundant tokens based on the importance of hidden states. Second, it employs a Transformer-level cache that reuses latent activations across time steps when changes are statistically insignificant. Learnable linear approximation reduces unnecessary computation while maintaining generation fidelity. Theoretical analysis demonstrates that FastCache maintains bounded approximation error under hypothesis-testing-based decision rules. Experimental evaluations of various DiT variants demonstrate significant reductions in latency and memory usage, and achieve the best generation output quality compared to other cache methods, as measured by FID and t-FID metrics. The FastCache code is available on GitHub ( https://github.com/NoakLiu/FastCache-xDiT) .

Takeaways, Limitations

•

Takeaways:

◦

We present FastCache, a novel caching and compression framework that effectively reduces the computational cost of DiT.

◦

Improving efficiency with a dual strategy of spatially aware token selection and transformer-level caching.

◦

Maintaining generation quality through learnable linear approximation.

◦

Demonstrated superior performance over other methods based on FID and t-FID metrics.

◦

Ensuring reproducibility and scalability by making the code public via GitHub.

•

Limitations:

◦

The effectiveness of the proposed method may depend on specific DiT variants and datasets.

◦

The performance of decision rules based on hypothesis testing is affected by the validity of the assumptions.

◦

Further experiments with more diverse DiT variants and larger datasets are needed.

◦

Further research may be needed on hyperparameter optimization of FastCache.

View PDF

Made with Slashpage