This paper points out the Limitations of the reward model (RM), which is essential for large-scale language model (LLM) optimization, and presents a novel approach to overcome it. Existing RMs have the problem that they are trained with fixed preference datasets and cannot adapt to various real-world needs. In this paper, we propose a generalizable RM that dynamically understands and follows reward principles expressed in natural language. To this end, we develop a new benchmark RABench to evaluate generalization ability for various principles, and present RewardAnything, a new RM that is designed and trained to explicitly follow natural language principles. RewardAnything achieves the best performance on existing RM benchmarks, and also shows excellent adaptability to new principles on RABench. In addition, RewardAnything can be seamlessly integrated with existing RLHF methods, and we demonstrate through a case study how to automatically and efficiently align LLMs using only natural language principles.