This paper addresses the issue of bias in resume screening using generative AI. While AI-based resume screening systems are increasingly used under the assumption that they can replace biased human judgment, we question the very evaluation capabilities of these systems. We conduct two experiments on eight major AI platforms and find that some models exhibit complex, contextual racial and gender biases that disadvantage applicants based solely on demographic signals. We also find that some models that appear to be unbiased fail to make substantive evaluations and instead rely on superficial keyword matching, which we term the “Illusion of Neutrality.” Therefore, we recommend adopting a double-checking framework for demographic bias and substantive ability to ensure the fairness and effectiveness of AI recruiting tools.