Large-Scale Language Models (LLMs) have significantly improved their inference capabilities through extensive training on massive datasets, but relying solely on additional data is becoming impractical. This paper highlights the need for models that improve their inference capabilities autonomously, without external supervision. We propose Debate, Train, Evolve (DTE), a novel groundtruth-free training framework that uses multi-agent discussion traces to develop a single language model. Furthermore, we introduce Reflect-Critique-Refine, a novel prompting strategy that explicitly instructs agents to critique and improve their inferences to enhance the quality of discussions. Extensive evaluations on six publicly weighted models across seven inference benchmarks demonstrate that the DTE framework achieves significant improvements, with an average accuracy increase of 8.92% on the challenging GSM-PLUS dataset. Furthermore, across all other benchmarks, the DTE framework improves accuracy by an average of 5.8%, demonstrating strong cross-domain generalization.