This paper presents the first comprehensive study comparing the open-source inference model DeepSeek-R1 with OpenAI's GPT-4o and GPT-4o-mini. We evaluated the performance of the 671B model and its scaled-down counterparts with just a few training runs, and found that DeepSeek-R1 achieved an F1 score of 91.39% on five emotion classification tasks and an accuracy of 99.31% on two emotion classification tasks. This represents an eight-fold improvement over GPT-4o, demonstrating high efficiency with just a few training runs. Furthermore, we analyzed the distillation effect by architecture, demonstrating that the 32B Qwen2.5-based model outperformed the 70B Llama-based model by 6.69 percentage points. DeepSeek-R1 improves explainability by transparently tracing the inference process step-by-step, but suffers from reduced throughput (Limitations).