
Second Place is Desperate: A Look at China's AI Crisis in Beijing
- Haebom
2






Item | Llama 4 Scout (109 B/17 B Active) | Gemma 3 27B QAT |
Multimodal | Simultaneous processing of text, images, voice, and video | Text/Image (Voice X) |
Context length | 10 M tokens (Maverick 1 M) | 128K tokens |
Inference speed | 120 tps (FP8, H100 1 sheet) | 20-25 tps (INT4, RTX A5000) |
Memory | Active 17 B → FP8 24-32 GB | INT4 12-24 GB |
MMLU-Pro | 69.6% (Scout) 79.4% (Maverick) | 66-67% (27B IT) |
License | Meta Community (Paid for companies with MAU 700 million↑) | Apache-2.0 (no derivative works disclosure obligation) |
GPU scale | H100 1 sheet (Scout) / 8 sheets↑ (Maverick) | 3090 1 sheet possible |