Jet-Nemotron is a novel hybrid architecture language model that achieves comparable or superior accuracy to existing full-attention models while significantly improving generation throughput. It was developed using a novel neural network architecture search pipeline called PostNAS (Post Neural Architecture Search). Unlike existing approaches, it efficiently explores attention block designs by fixing MLP weights based on a pre-trained full-attention model. Key components include optimal full-attention layer placement and removal, linear attention block selection, novel attention block design, and hardware-aware hyperparameter search. Compared to Qwen3, Qwen2.5, Gemma3, and Llama3.2, the Jet-Nemotron-2B model achieves comparable or superior accuracy across multiple benchmarks, while achieving up to 53.6x faster generation throughput and 6.1x faster pre-filling speedup. It also achieves higher accuracy on MMLU and MMLU-Pro than recent advanced MoE full-attention models such as DeepSeek-V3-Small and Moonlight. This is possible despite the larger model having 15B total parameters and 2.2B activated parameters.