This paper explores post-training techniques for large-scale language models (LLMs) to aid in quantum circuit design, simulation, and execution using Qiskit. We present quantum verification as an effective method to ensure the quality of quantum code and its executableness on quantum hardware. We develop a synthetic data pipeline that generates quantum problem-unit test pairs, generates preference data for Direct Preference Optimization (DPO), and trains models using Guided Reward Preference Optimization (GRPO), leveraging quantum-verifiable rewards provided by quantum hardware. The best-performing model, combining DPO and GRPO, outperforms the most powerful open-source baseline model on the Qiskit-HumanEval-hard benchmark.