Experimental results on image, protein sequence, and molecule generation tasks demonstrate excellent performance and training acceleration. Specifically, on the class-conditional ImageNet $256\times 256$ benchmark, the proposed instructions achieve training speeds 23.3x faster than the existing SiT-XL and 4x faster than the state-of-the-art REPA method.