In reinforcement learning, specifying a reward function that captures the intended action can be very difficult. Reward learning attempts to solve this problem by learning a reward function. However, the learned reward model may produce policies with low errors in the data distribution, but then have large regrets. We say that such reward models suffer from error-regret inconsistency. The main cause of error-regret inconsistency is the distribution shift that typically occurs during policy optimization. In this paper, we mathematically show that while the reward model guarantees a sufficiently low expected test error to have low worst-case regret, there are realistic data distributions where error-regret inconsistency can occur for any fixed expected test error. We then show that similar problems persist even when using policy regularization techniques commonly used in methods such as RLHF. We hope that our results will stimulate theoretical and empirical research on improved ways to learn reward models and better ways to reliably measure their quality.