Jet-Nemotron is a novel hybrid-architecture language model that significantly improves generation throughput while maintaining or exceeding the accuracy of existing full-attention models. It was developed using a novel neural network architecture search pipeline called PostNAS (Post Neural Architecture Search). Unlike existing approaches, it efficiently explores attention block designs by fixing the MLP weights of a pre-trained full-attention model. Key components include optimal full-attention layer placement and removal, linear attention block selection, novel attention block design, and hardware-aware hyperparameter search. The Jet-Nemotron-2B model achieves similar or superior accuracy to Qwen3, Qwen2.5, Gemma3, and Llama3.2 across various benchmarks, while delivering up to 53.6x faster generation throughput and 6.1x faster prefilling. It also achieves higher accuracy on MMLU and MMLU-Pro than recent advanced MoE full-attention models such as DeepSeek-V3-Small and Moonlight.