In this paper, we propose a reliable and scalable pipeline for detecting “LLM-dominant” content, i.e., automatically generated web content by large-scale language models (LLMs). Existing LLM detectors perform well only on clean, prose-like text, but web content has limitations due to its complex markup and diverse genres. Therefore, instead of simply classifying the text extracted from each page, we present a pipeline that classifies each site based on the output of the LLM text detector for multiple prose-like pages. We train and evaluate the detector on two independent baseline datasets of 120 sites, and achieve 100% accuracy. We test the detector in a real-world environment against 10,000 sites from search engine results and Common Crawl archives, and find that it detects a significant number of LLM-dominant sites, which rank highly in search results and are increasing in number, raising concerns about their impact on end users and the web ecosystem as a whole.