This paper proposes Merge-of-Thought Distillation (MoT), a novel framework for efficient inference distillation of long-form thought process (CoT) models by leveraging various teacher models. Unlike conventional distillation methods that rely on a single teacher model, MoT integrates the inference capabilities of multiple teacher models to train a student model. Considering that the optimal teacher model varies across student models and datasets, we propose a lightweight framework that iteratively fine-tunes each teacher's guidance and merges the weight space of the resulting student model variants. Applying MoT to the Qwen3-14B student model using only approximately 200 high-quality CoT samples on a competitive mathematics benchmark, we demonstrate performance improvements that outperform powerful models such as DEEPSEEK-R1, QWEN3-30B-A3B, QWEN3-32B, and OPENAI-O1. Furthermore, MoT outperforms single-teacher distillation and simple multi-teacher integration, mitigating overfitting and demonstrating robustness to distribution shift and equally skilled teachers. Furthermore, MoT demonstrates its effectiveness in reducing catastrophic forgetting, improving general reasoning skills beyond mathematics, and even fostering better teachers. These results demonstrate that MoT is a simple and scalable method for efficiently distilling long-form CoT skills from diverse teachers into smaller student models.