This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper proposes an efficient fault-tolerance technique for soft errors occurring during the inference process of a Transformer model. Existing fault-tolerance frameworks based on computational units suffer from significant computational and memory overhead and limited scalability. This paper addresses these issues by treating the computations within the attention module as a single kernel, implementing end-to-end fault tolerance. It provides comprehensive error protection for nonlinear operations and designs a stride-based fault-tolerance algorithm (ABFT) for linear modules to avoid inter-thread communication. Experimental results demonstrate a speedup of up to 7.56x compared to existing methods, with an average fault-tolerance overhead of 13.9%.
Takeaways, Limitations
•
Takeaways:
◦
Presenting an efficient solution to soft errors that occur during the inference process of the Transformer model.
◦
Presentation of an end-to-end fault-tolerant technique that provides significantly improved speed and efficiency compared to existing methods.
◦
Efficient error protection of linear modules is proposed using stride-based fault tolerance (ABFT).
•
Limitations:
◦
There is a possibility that the effectiveness of the proposed method may be limited to specific hardware environments or Transformer models of specific sizes.
◦
Comprehensive experiments on various types of soft errors may be lacking.
◦
Further research is needed on the applicability to other types of models or inference processes.