This paper presents a novel method for improving the complex code inference ability of large-scale language models (LLMs). While LLMs perform well on routine coding tasks, they can fail on complex tasks requiring non-trivial inferences about program semantics. To address this issue, this study explores a method for synthetically generating code inference training data based on the Semantic Inequity Game (SInQ). A generator agent generates semantically distinct program variants derived from a dataset of real-world programming tasks, and an evaluation agent identifies input examples that cause the behavior of the original program and the generated variants to differ. The two agents train each other semi-adversarially, and we demonstrate that this setup can theoretically improve infinitely through self-play assuming infinite computational resources. We validate the effectiveness of the proposed method through experiments on various code generation and understanding benchmarks, including multilingual vulnerability detection and the Python built-in identifier exchange benchmark. Despite being trained solely on Python code, the proposed method improves vulnerability detection in C/C++ code and achieves significant performance gains on the Python built-in identifier exchange benchmark, a benchmark where existing LLMs struggle. We have made public the code required to reproduce the experiment and the generated synthetic data, allowing other researchers to fine-tune the LLM.