Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention

Created by
  • Haebom

Author

Huangliang Dai, Shixun Wu, Jiajun Huang, Zizhe Jian, Yue Zhu, Haiyang Hu, Zizhong Chen

Outline

This paper proposes an efficient fault-tolerance technique for soft errors occurring during the inference process of a Transformer model. Existing fault-tolerance frameworks based on computational units suffer from significant computational and memory overhead and limited scalability. This paper addresses these issues by treating the computations within the attention module as a single kernel, implementing end-to-end fault tolerance. It provides comprehensive error protection for nonlinear operations and designs a stride-based fault-tolerance algorithm (ABFT) for linear modules to avoid inter-thread communication. Experimental results demonstrate a speedup of up to 7.56x compared to existing methods, with an average fault-tolerance overhead of 13.9%.

Takeaways, Limitations

Takeaways:
Presenting an efficient solution to soft errors that occur during the inference process of the Transformer model.
Presentation of an end-to-end fault-tolerant technique that provides significantly improved speed and efficiency compared to existing methods.
Efficient error protection of linear modules is proposed using stride-based fault tolerance (ABFT).
Limitations:
There is a possibility that the effectiveness of the proposed method may be limited to specific hardware environments or Transformer models of specific sizes.
Comprehensive experiments on various types of soft errors may be lacking.
Further research is needed on the applicability to other types of models or inference processes.
👍