This paper addresses the problem of continuous sign language segmentation, which is a critical task for sign language translation and data annotation. We propose a Transformer-based architecture that models temporal dynamics, and define frame segmentation as a sequence labeling problem using the Begin-In-Out (BIO) tagging technique. We leverage HaMeR hand features and 3D angles, and demonstrate that our approach achieves state-of-the-art results on the DGS Corpus and outperforms existing benchmarks on BSLCorpus.