This paper presents the first comprehensive study of latent multi-head attention (MLA) for small language models, revealing an interesting trade-off between efficiency and quality. We train a 30 million-parameter GPT model on a dataset of 100,000 synthetic stories and benchmark three architectural variants: standard multi-head attention (MHA), MLA, and MLA with rotated position embedding (RoPE) (MLA+RoPE). Our main results show that MLA+RoPE with semi-hierarchical latent dimension (r=d/2) reduces KV-cache memory usage by 45% while increasing validation loss by only 0.3% (essentially the same quality as MHA), achieving Pareto improvement when deployed in memory-constrained environments. We also show that RoPE is important for MLA for small models. Without RoPE, MLA performs 3-5% worse than basic attention, but with RoPE, it performs 2% better. Inference benchmarks on NVIDIA A100 GPUs show that MLA with r=d/2 achieves 1.4x speedup over full-layer MLA while maintaining memory savings. GPT-4 evaluation shows that it achieves the highest quality score (7.4/10) on grammar, creativity, and consistency metrics. Code and models will be made public upon acceptance.