This paper analyzes the problem of identifying the output of large-scale language models (LLMs) in an adversarial environment (e.g., cyber-attacks, disinformation dissemination) from theoretical and empirical perspectives. Leveraging formal language theory (identification in the limit) and data-driven analysis of the expanding LLM ecosystem, we model the possible output set of LLMs in a formal language and analyze whether a finite text sample can uniquely specify a generative model. As a result, we show that certain types of LLMs are fundamentally indistinguishable from their outputs alone, under moderate assumptions of model-to-model overlap. We describe four domains of theoretical identifiability (i. infinite deterministic (discrete) LLM language classes are indistinguishable, 2. infinite probabilistic LLM classes are also indistinguishable, 3. finite deterministic LLM classes are identifiable, and 4. even finite probabilistic LLM classes may be indistinguishable), and quantitatively analyze the explosive growth in the number of possible model origins (hypothesis spaces) for specific outputs in recent years. Even under conservative assumptions (each open-source model is fine-tuned on at most one new dataset), the number of unique candidate models doubles approximately every 0.5 years, and allowing for combinations of multi-dataset fine-tuning reduces the doubling time to 0.28. This combinatorial growth, combined with the prohibitive computational cost of computing brute-force likelihood ratios for all models and potential users, makes complete identification impractical in practice.