This paper presents a method to extend data-driven learning to consider history-dependent multi-fidelity data, quantify epistemic uncertainty, and separate it from data noise (probabilistic uncertainty). The method, which has a hierarchical structure, is applicable to a variety of learning scenarios, from simple single-fidelity deterministic neural network learning to the proposed multi-fidelity variance estimation Bayesian recurrent neural network learning. The versatility and generality of the method are demonstrated by applying it to several data-driven configurational modeling scenarios using data of various fidelities (with and without noise). The method accurately predicts responses, quantifies model errors, and captures noise distributions (if any). This opens up opportunities for practical applications in a wide range of scientific and engineering fields, including the most challenging cases involving design and analysis under uncertainty.