This paper presents a non-asymptotic convergence analysis of Q-learning and actor-critic algorithms for robust mean-reward Markov decision processes (MDPs) under contamination, total-variation (TV) distance, and Wasserstein uncertainty sets. The key element of the analysis is to show that the optimal robust Q operator is strictly contractile for carefully designed quasi-norms (excluding constant functions). This property enables a probabilistic approximate update that learns the optimal robust Q-function using $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples. Furthermore, we provide an efficient routine for robust Q-function estimation, which facilitates robust critic estimation. Building on this, we present an actor-critic algorithm that learns $\epsilon$-optimal robust policies within $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples. Numerical simulations are provided to evaluate the performance of the algorithm.