We present a large-scale, multimodal dataset called HateClipSeg. This dataset contains over 11,714 video segments, each labeled with five offensive categories: normal or hateful, insulting, sexual, violent, and self-harm. It is annotated at both the overall video and segment levels, and includes victim information. We benchmark model performance and demonstrate the limitations of existing models by presenting three tasks: (1) classifying edited hate videos, (2) temporally localizing hate videos, and (3) classifying online hate videos. The dataset is publicly available.