Sign In

Attention Sinks Induce Gradient Sinks: Massive Activations as Gradient Regulators in Transformers

Created by
  • Haebom
Category
Empty
👍