This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
This paper analyzes the current state of affairs that relies on CUDA-based GPU computation to address the rapidly increasing parameters and computational demands of deep learning models. The dominant position of the CUDA ecosystem has necessitated the need to support CUDA-based software on other hardware platforms, but translating CUDA code to other platforms is a difficult task. Existing approaches have limitations and require high development costs. To address these issues, this paper proposes a novel framework that generates high-performance CUDA code and corresponding platform code pairs by leveraging AI compilers and automatic optimization techniques. We add a graph-based data augmentation method and introduce the HPCTransEval benchmark to evaluate the performance of CUDA translation. We conduct experiments on CUDA-to-CPU translation as a case study, demonstrating the speedup of CPU operators and highlighting the potential of LLM to address the compatibility issues of the CUDA ecosystem. The source code is open source.
Takeaways, Limitations
•
Takeaways:
◦
We present a novel framework for efficiently translating CUDA code to other platforms using LLM.
◦
Evaluating LLM performance and demonstrating potential improvements through graph-based data augmentation and the HPCTransEval benchmark.
◦
Demonstrating the potential of resolving compatibility issues in the CUDA ecosystem by improving CPU operator speed (43.8% on average).
◦
Ensuring reproducibility and expandability of research through open source disclosure.
•
Limitations:
◦
Currently, this is a case study focusing on CUDA-to-CPU conversion, so conversion performance to other platforms requires further research.
◦
LLM performance may still be suboptimal for high-performance code.
◦
Further validation of the versatility and comprehensiveness of the HPCTransEval benchmark is needed.
◦
Further research is needed to explore the scalability and generalizability of LLM-based approaches.