This paper presents the results of a comparative evaluation of the Mixture-of-Experts (MoE)-based GPT-OSS-20B model (20.9 billion parameters in total, about 3.6 billion active parameters) with dense models Qwen3-32B and Yi-34B on a single GPU (H100, bf16). The evaluation metrics are Time to First Token (TTFT), Total Decoding Throughput (TPOT), End-to-End Latency Percentile, Peak VRAM Usage (including PKV), and Energy Consumption, using a consistent nvidia-smi-based sampler. Under 2048-token context and 64-token decoding conditions, GPT-OSS-20B exhibits higher decoding throughput and energy efficiency per token than Qwen3-32B and Yi-34B, and also significantly reduces peak VRAM usage and energy consumption per 1000 tokens generated. The TTFT is higher due to the MoE routing overhead. Using only 17.3% of the active parameters (3.6 billion out of 20.9 billion), GPT-OSS-20B achieved approximately 31.8% higher decoding throughput and 25.8% lower energy consumption per 1,000 tokens generated than Qwen3-32B under 2048/64 conditions, while also reducing peak VRAM usage by 31.7%. When normalized to the active parameters, GPT-OSS-20B showed significantly higher Active Per Parameter Efficiency (APE), highlighting the distributional benefits of MoE. This study focuses on distribution and does not evaluate accuracy. We make the code and integrated results public for reproducibility and extension.