To address the lack of high-quality pre-training data in the cybersecurity field, we present a comprehensive dataset covering key training stages, including pre-training, instruction fine-tuning, and inference distillation. Extensive analytical studies demonstrate the dataset's effectiveness on public cybersecurity benchmarks, demonstrating that continuous pre-training with the dataset leads to a 15.9% improvement in aggregate scores, and inference distillation leads to a 15.8% improvement in security certification (CISSP). To encourage research, we release the entire dataset and the trained cybersecurity LLM under the Open Data Collection By-Laws (ODC-BY) and MIT licenses.