This paper investigates the potential inference capabilities of large-scale language models (LLMs), specifically their ability to combine two facts through two-step question answering. Previous research has shown that LLMs struggle with two-step question answering without a CoT (Coordination of the Thinking Process). This study fine-tunes LLMs using synthetic facts, thereby assessing their pure inference capabilities without memorization or inference shortcuts. Experiments with models such as Llama 3 8B and GPT-4o show that while these models fail to combine two synthetic facts, they succeed in combining one synthetic fact with one natural language fact. This suggests that LLMs have potential two-step inference capabilities, but it remains unclear how this capability scales with model size. Finally, we emphasize the importance of LLM inference researchers to avoid both false successes due to memorization or inference shortcuts and false failures due to artificial experimental setups when drawing conclusions about the potential inference capabilities of LLMs.