This paper points out the limitations of the evidence evaluation methods used in existing automated fact checking (AFC) and proposes a new evaluation metric, Ev²R. Existing methods evaluate the validity of evidence only by relying on the accuracy of the prediction result or the exact match with a predefined knowledge base (e.g., Wikipedia), which has limitations due to the original purpose of the evaluation metric and the limitations of the knowledge base. Ev²R solves the shortcomings of existing methods by combining the reference-based evaluation and the prediction result-based score to simultaneously evaluate how well the evidence matches the reference materials and how reliably it supports the prediction result. Experimental results show that Ev²R outperforms existing methods in terms of accuracy and robustness, and in particular, has a high correlation with human judgment and excellent resistance to adversarial attacks.