This paper proposes TIDE (Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transforms), a novel framework that enhances the interpretability of the less-studied Diffusion Transformer (DiT) compared to U-Net-based diffusion models. TIDE extracts sparse, interpretable activation features from DiT over time, demonstrating that DiT naturally learns hierarchical semantics (e.g., 3D structure, object classes, and detailed concepts) during a large-scale pretraining process. Experimental results demonstrate that TIDE enhances interpretability and controllability while maintaining generation quality, making it suitable for applications such as secure image editing and style transfer.