New AI models generate step-by-step inference text before generating an answer. This text appears to reveal the model's computational process and is increasingly used for transparency and interpretability. However, it is unclear whether the way humans interpret this text matches the model's actual computational process. This paper investigates a necessary condition for this response: the ability of humans to discern which steps in the inference text causally influence later steps. We assessed human performance by formulating questions based on counterfactual measures and found significant differences. Participants' accuracy was only 29%, slightly above chance (25%), and even when assessing majority votes on questions with high consensus, the accuracy was only 42%. These results reveal a fundamental difference between how humans interpret inference text and how models use it, raising questions about its utility as a simple interpretability tool. We argue that inference text should not be taken for granted but rather treated as an artifact worthy of investigation, and that understanding the inhuman ways in which these models use language is a crucial research direction.