This paper proposes UTRL, a novel reinforcement learning framework that trains LLMs to generate high-quality unit tests based on programming guidelines. UTRL trains two LLMs, a unit test generator and a code generator, through adversarial iterative training. The unit test generator is trained to maximize a discriminatory reward based on its ability to generate tests that expose defects in the code generator's solutions, while the code generator is trained to maximize a code reward based on its ability to generate solutions that pass the unit tests generated by the unit test generator. Experimental results show that Qwen3-4B trained via UTRL generates higher-quality unit tests and outperforms state-of-the-art models such as GPT-4.1 in code evaluation.