In this paper, we present Reinforcement Learning Internal Feedback (RLIF), a reinforcement learning method that uses only model-internal signals without external rewards. We attempt to improve the inference performance of baseline LLM on mathematical inference benchmarks by leveraging unsupervised learning reward surrogates such as token-level entropy, path-level entropy, and self-confidence. In the early stages, we achieve similar or better performance than RLVR techniques, but we find that the performance degrades as training progresses, especially for models that are already instructively tuned. We explain the training behavior of RLIF through model weight mixture analysis, and provide practical guidelines for incorporating internal feedback signals into LLM training.