Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Octic Vision Transformers: Quicker Visions Through Equivariance

Created by
  • Haebom

Author

David Nordstr om, Johan Edstedt, Fredrik Kahl, Georg B okman

Outline

This paper questions why state-of-the-art Vision Transformers (ViTs) are not designed to exploit natural geometric symmetries such as 90-degree rotations and reflections, and argues that the lack of efficient implementations is the cause. To address this, we propose Octic Vision Transformers (octic ViTs), which leverage octic group isomorphism. Octic linear layers reduce FLOPs by 5.33x and memory by up to 8x compared to conventional linear layers. We study two families of ViTs composed of octic blocks and train them on ImageNet-1K using supervised (DeiT-III) and unsupervised (DINOv2) learning methods. We demonstrate significant efficiency improvements while maintaining baseline accuracy.

Takeaways, Limitations

Takeaways:
A new architecture proposal (octic ViT) that improves the efficiency of existing ViT.
We present an efficient implementation that exploits geometric symmetry without increasing computational cost.
Achieving both baseline accuracy and efficiency on ImageNet-1K.
Limitations:
The generalizability of the architecture and its performance on other datasets need to be verified.
Lack of additional information on specific implementation and training details.
There is a need to explore its potential for various applications.
👍