This paper highlights the need for reinforcement learning (RL) agents that learn from alternative supervision signals, such as non-rewarded interactions and unlabeled or incomplete demonstrations, rather than relying solely on explicit reward maximization. To develop general agents that can efficiently adapt to real-world environments, it is necessary to utilize these non-rewarded signals to guide learning and actions. In this paper, we propose a framework that transforms the similarity between the agent’s state and expert data into a well-formed intrinsic reward by applying a mapping function. This enables flexible and goal-oriented exploration of expert-like behaviors. We use a mixed autoencoder expert to capture diverse behaviors and accommodate missing information in demonstrations. Experimental results show that the proposed method enables robust exploration and performance in both sparse and dense reward environments, even when demonstrations are scarce or incomplete. This provides a practical framework for RL in realistic settings where optimal data is unavailable and precise reward control is required.