To address the challenges of machine translation (MT) quality assessment for low-resource African languages, this study introduces a large-scale human-annotated MT evaluation dataset (SSA-MTE) covering 14 African language pairs. SSA-MTE contains over 73,000 sentence-level annotations from the news domain, and we develop improved reference-based and reference-free evaluation metrics, SSA-COMET and SSA-COMET-QE, based on this dataset. We also benchmark prompt-based approaches using state-of-the-art LLMs such as GPT-4o, Claude-3.7, and Gemini 2.5 Pro. Experimental results show that the SSA-COMET model significantly outperforms AfriCOMET and is competitive with Gemini 2.5 Pro, particularly for low-resource languages such as Twi, Luo, and Yoruba. All resources used in this study are released under an open license.