Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Attention as an Adaptive Filter

Created by
  • Haebom

Author

Peter Racioppo

Outline

Adaptive Filter Attention (AFA) is a novel attention mechanism that directly integrates a learnable dynamic model into the calculation of attention weights. Instead of directly comparing queries and keys, it models the input sequence as discrete observations of a linear stochastic differential equation (SDE). By simultaneously imposing a linear dynamic model with a diagonalizable state matrix and noise covariance, it efficiently propagates pairwise uncertainty using a closed-form solution to the differential Lyapunov equation. Attention naturally emerges as a maximum likelihood solution to this linear SDE, and the attention weights correspond to robust residual-based reweighting of the propagated pairwise precision. Imposing additional constraints on the eigenvalues of the state matrix yields a simplified variant with the same computational and memory complexity as standard attention. In the limit where dynamic and process noise vanish, and with a small-angle approximation, it recovers the typical inner product attention.

Takeaways, Limitations

Proposing a new attention mechanism, Adaptive Filter Attention (AFA).
Integrating learnable dynamic models into attention weight calculations.
Modeling the input sequence using linear SDE
A simplified variant exists with the same computational and memory complexity as standard attention.
Standard attention recovery at extremes where dynamic and process noise disappear.
👍