In this paper, we propose a novel method, "Knockout Assessment", that utilizes large-scale language models (LLMs) as evaluators. To improve the existing LLM evaluation methods that rely on individual evaluations or single-round pairwise comparisons, which lack an understanding of the overall ranking, Knockout Assessment performs evaluations in a tournament manner through repeated pairwise comparisons. Experimental results using three LLMs and two datasets show that Knockout Assessment improves the evaluation accuracy and makes the evaluation of LLMs more consistent with human evaluations, such as improving the average Pearson correlation with expert evaluations by 0.07 in university-level exam grading and machine translation evaluation.