This paper analyzes the differences between the traditional view of reinforcement learning (RL) and continuous reinforcement learning (CRL), and proposes a new formalism suitable for CRL. While traditional RL stops learning once it finds an optimal policy, CRL aims at continuous learning and adaptation. We argue that four pillars of traditional RL, namely Markov Decision Processes (MDPs), a focus on time-independent artifacts, an expected-reward sum evaluation metric, and an episode-based benchmark environment that follow these pillars, are in conflict with the goals of CRL. We propose a new formalism that replaces the first and third pillars of traditional RL with a new deviation regret evaluation metric suitable for history process and continuous learning, and discuss possible approaches to improve the other two pillars.