This paper proposes HiTeC, a novel framework for self-supervised learning on text-attributed hypergraphs (TAHGs). We highlight the limitations of existing contrastive learning-based methods, which fail to effectively utilize the textual information in TAHGs, suffer from noise due to random data augmentation, and struggle to capture long-range dependencies. HiTeC consists of a text encoder pretraining stage using a structure-aware contrastive objective function, followed by a second stage that utilizes semantic-aware augmentation strategies such as prompt-enhanced text augmentation and semantic-aware hyperedge deletion. Furthermore, we propose a multi-scale contrastive loss function that better captures long-range dependencies through s-walk-based subgraph-level contrastiveness. This two-stage design decouples text encoder pretraining from hypergraph contrastive learning, thereby improving scalability and maintaining representation quality. We demonstrate the effectiveness of HiTeC through extensive experiments.