Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Attention as an Adaptive Filter

Created by
  • Haebom

Author

Peter Racioppo

Outline

We introduce Adaptive Filter Attention (AFA). AFA is a novel attention mechanism that directly integrates a learnable dynamic model into the calculation of attention weights. Instead of directly comparing queries and keys, we model the input sequence as discrete observations of a linear stochastic differential equation (SDE). Simultaneously, we assume a continuous-time linear time-invariant system with a diagonalizable state matrix and noise covariance, efficiently propagating uncertainty through the dynamics from keys to queries using the closed-form solution of the differential Lyapunov equation. Attention naturally emerges as a maximum-likelihood solution for filtering the trajectory of this linear SDE, and the attention weights correspond to robust residual-based reweighting of the propagated query-key precision. Furthermore, by constraining the system dynamics and noise, we obtain a simplified variant with the same computational and memory complexity as standard attention. With zero attenuation and process noise and a small-angle approximation, we recover the complex-valued generalization of common dot-product attention using rotational position encoding.

Takeaways, Limitations

We present a novel approach to attention mechanisms by incorporating learnable dynamic models into attention computation.
Efficiently propagating uncertainty between queries and keys through linear SDE modeling.
We present a simplified AFA variant with the same computational and memory complexity as standard attention.
Recovering complex-valued generalization of dot-product attention using rotational position encoding.
The paper lacks specific experimental results or information on actual performance comparisons.
Analysis of the impact of constraints and assumptions of linear SDE models on performance is needed.
Research is needed on optimal hyperparameter settings for AFA.
👍