Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

P4OMP: Retrieval-Augmented Prompting for OpenMP Parallelism in Serial Code

Created by
  • Haebom

Author

Wali Mohammad Abdullah, Azmain Kabir

Outline

P4OMP is a search-augmented framework that uses a large-scale language model (LLM) to transform serial C/C++ code into OpenMP-annotated parallel code. It is the first system to apply search-based prompting for OpenMP pragma correctness without model tuning or compiler instrumentation. It improves the reliability of prompt-based code generation by leveraging search-augmented generation (RAG) using structured instruction knowledge from OpenMP tutorials. It improves syntactic correctness compared to baseline prompting using GPT-3.5-Turbo by basing the generation on the searched context. It is evaluated against a baseline (using GPT-3.5-Turbo without search) on a comprehensive benchmark of 108 real-world C++ programs taken from the Stack Overflow, PolyBench, and NAS benchmark suites. P4OMP achieves 100% compilation success in all parallelizable cases, while the baseline fails to compile in 20 out of 108 cases. Six cases that depend on non-sequential access iterators or unthread-safe constructs are excluded due to fundamental OpenMP limitations. A detailed analysis shows how P4OMP consistently avoids the range errors, syntax misuses, and incorrect directive combinations that commonly affect reference-generated code. We further demonstrate strong runtime scaling on seven compute-intensive benchmarks on an HPC cluster. P4OMP provides a robust and modular pipeline that significantly improves the reliability and applicability of LLM-generated OpenMP code.

Takeaways, Limitations

Takeaways:
We present a novel approach to automate OpenMP parallelization of C/C++ code using LLM.
Improve the accuracy of OpenMP pragmas and improve compilation success rates through search-based prompting.
Improved syntactic accuracy and runtime performance compared to existing methods.
Proof of efficient parallel processing in HPC environments.
Limitations:
Not applicable to code that uses non-sequential access iterators or non-thread-safe constructs.
Some code cannot be parallelized due to fundamental limitations of OpenMP.
The scope of benchmarks used in evaluation may be limited.
Need for performance and accuracy verification for more diverse and complex C/C++ code.
👍