Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Quantum Verifiable Rewards for Post-Training Qiskit Code Assistant

Created by
  • Haebom

Author

Nicolas Dupuis, Adarsh Tiwari, Youssef Mroueh, David Kremer, Ismael Faro, Juan Cruz-Benito

Outline

This paper explores post-training techniques for large-scale language models (LLMs) to aid in quantum circuit design, simulation, and execution using Qiskit. We present quantum verification as an effective method to ensure the quality of quantum code and its executableness on quantum hardware. We develop a synthetic data pipeline that generates quantum problem-unit test pairs, generates preference data for Direct Preference Optimization (DPO), and trains models using Guided Reward Preference Optimization (GRPO), leveraging quantum-verifiable rewards provided by quantum hardware. The best-performing model, combining DPO and GRPO, outperforms the most powerful open-source baseline model on the Qiskit-HumanEval-hard benchmark.

Takeaways, Limitations

Takeaways:
Demonstrating the potential of quantum programming support using LLM.
Improving code quality and ensuring executable performance through quantum verification.
Performance improvement through combination of DPO and GRPO.
Achieved excellent performance on the Qiskit-HumanEval-hard benchmark.
Limitations:
Limitations of data generation methods that rely on synthetic data pipelines.
Constraints and accessibility issues of actual quantum hardware.
Dependency on a specific quantum programming framework (Qiskit).
Lack of validation of generalization performance beyond the Qiskit-HumanEval-hard benchmark.
👍