Sparse autoencoders (SAEs) are assumed to decompose activation functions into interpretable linear directions, but this is only true when they consist of sparse linear combinations of underlying features. In this paper, we find that when the SAE is narrower than the number of "true features" trained and features are correlated, the SAE merges correlated feature components, destroying the single meaning. This phenomenon is called "feature hedging," and both conditions are almost certainly present in LLM SAEs. Feature hedging is caused by the SAE reconstruction loss and becomes more severe as the SAE becomes narrower. In this study, we introduce the feature hedging problem and study it theoretically on a toy model and empirically on LLM-trained SAEs. We hypothesize that feature hedging may be a key reason why SAEs consistently underperform supervised baselines. Finally, based on our understanding of feature hedging, we propose an improved variant of the matryoshka SAE. SAE width is not a neutral hyperparameter, and we show that narrow SAEs are more affected by feature hedging than wide SAEs.