Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling

Created by
  • Haebom

Author

M onika Farsang, Ramin Hasani, Daniela Rus, Radu Grosu

Outline

LrcSSM is a nonlinear recurrent model that processes long sequences at the speed of conventional linear state-space hierarchies. By restricting the state transition matrix to be diagonal and learning it at each step, we can process the entire sequence in parallel using a single prefix-scan. This achieves time and memory complexity of $\mathcal{O}(TD)$ and sequential depth of $\mathcal{O}(\log T)$ for input sequence length T and state dimension D. It also provides formal gradient stability guarantees, unlike other input variational systems such as Liquid-S4 or Mamba. With forward and backward propagation costs of $\Theta(T D L)$ FLOPs for network depth L, and a low sequential depth and number of parameters of $\Theta(D L)$, it follows the computationally optimal scaling law regime recently observed in Mamba ($\beta \approx 0.42$). It outperforms the quadratic-attention Transformer under the same computational requirements, and avoids the memory overhead of FFT-based long convolutions. On a series of long-term prediction tasks, LrcSSM outperforms LRU, S5, and Mamba.

Takeaways, Limitations

Takeaways:
We present a nonlinear recurrent model that processes long sequences quickly.
It achieves time and memory complexity of $\mathcal{O}(TD)$ and sequential depth of $\mathcal{O}(\log T)$.
Provides formal slope stability guarantees.
It follows the computational optimal scaling law regime and outperforms the quadratic-attention Transformer with the same amount of computation.
It outperforms existing models in long-term forecasting tasks.
Limitations:
__T2795_____ mentioned in the paper is not explicitly stated. Additional experiments and analyses may be needed for a more comprehensive evaluation.
👍