Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset

Created by
  • Haebom

Author

Abdul Basit, Nouhaila Innan, Muhammad Haider Asif, Minghao Shao, Muhammad Kashif, Alberto Marchisio, Muhammad Shafique

Outline

This paper presents PennyLang, a high-quality dataset dedicated to PennyLane, to address the lack of high-quality datasets that limit the utilization of large-scale language models (LLMs) in quantum software development. PennyLang consists of 3,347 PennyLane quantum code samples and contextual descriptions collected from textbooks, official documents, and open-source repositories. This paper presents three contributions: the generation and release of PennyLang, an automated quantum code dataset construction framework, and baseline evaluation using multiple open-source models within the Retrieval-Augmented Generation (RAG) pipeline. Experimental results demonstrate that combining RAG and PennyLang significantly improves the performance of the Qwen 7B and LLaMa 4 models. This contrasts with previous research focused on Qiskit, contributing to the advancement of AI-assisted quantum development by providing LLM-based tools and reproducible methods for PennyLane.

Takeaways, Limitations

Takeaways:
Accelerate LLM-based quantum software development by providing a high-quality dataset for quantum programming, PennyLang.
An automated quantum code dataset construction framework can help you systematize and streamline the dataset building process.
We experimentally demonstrate that the quantum code generation performance of LLM can be significantly improved by leveraging the RAG pipeline.
Providing LLM-based tools to PennyLane opens up new possibilities for AI-enabled quantum development.
Limitations:
The PennyLang dataset is specific to PennyLane and may not be directly applicable to other quantum programming frameworks.
The current evaluation is limited to a specific open source model, and evaluation of a wider range of models is needed.
Further research is needed to explore the generality and scalability of automated dataset construction frameworks.
Since the performance of the RAG pipeline is highly dependent on the quality of the dataset, quality control of the dataset is important.
👍