This paper focuses on the speedup potential of pre-training Transformers in FP4 precision, but proposes a novel training method, called TetraJet, to address the accuracy degradation issue. We identify the weight oscillation issue as the main cause of the accuracy degradation in training with the conventional MXFP4 data format, and propose two new methods, called EMA Quantizer (Q-EMA) and Adaptive Ramping Optimizer (Q-Ramping), to address it. Through extensive experiments on Vision Transformers, we demonstrate that it outperforms conventional 4-bit training methods, reduces the accuracy degradation by more than 50% compared to the baseline model, and is even competitive with full precision training.