In this paper, we propose BLADE, a data-free joint training framework that combines Adaptive Block-Sparse Attention (ASA) and sparsity-aware step distillation to address the inference bottleneck of Diffusion Transformer for high-quality video generation. BLADE features an ASA mechanism that dynamically generates content-aware sparsity masks, and a sparsity-aware step distillation scheme that directly integrates sparsity into the distillation process based on Trajectory Distribution Matching (TDM). In experiments on text-to-video models such as CogVideoX-5B and Wan2.1-1.3B, BLADE demonstrates significant efficiency improvements, achieving end-to-end inference acceleration of 14.10x on Wan2.1-1.3B and 8.89x on CogVideoX-5B. This acceleration is supported by quality improvements in the VBench-2.0 benchmark (CogVideoX-5B from 0.534 to 0.569, Wan2.1-1.3B from 0.563 to 0.570) and human evaluation results.