Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Can adversarial attacks by large language models be attributed?

Created by
  • Haebom

Author

Manuel Cebrian, Andres Abeliuk, Jan Arne Telle

Outline

This paper analyzes the problem of identifying the output of large-scale language models (LLMs) in an adversarial environment (e.g., cyber-attacks, disinformation dissemination) from theoretical and empirical perspectives. Leveraging formal language theory (identification in the limit) and data-driven analysis of the expanding LLM ecosystem, we model the possible output set of LLMs in a formal language and analyze whether a finite text sample can uniquely specify a generative model. As a result, we show that certain types of LLMs are fundamentally indistinguishable from their outputs alone, under moderate assumptions of model-to-model overlap. We describe four domains of theoretical identifiability (i. infinite deterministic (discrete) LLM language classes are indistinguishable, 2. infinite probabilistic LLM classes are also indistinguishable, 3. finite deterministic LLM classes are identifiable, and 4. even finite probabilistic LLM classes may be indistinguishable), and quantitatively analyze the explosive growth in the number of possible model origins (hypothesis spaces) for specific outputs in recent years. Even under conservative assumptions (each open-source model is fine-tuned on at most one new dataset), the number of unique candidate models doubles approximately every 0.5 years, and allowing for combinations of multi-dataset fine-tuning reduces the doubling time to 0.28. This combinatorial growth, combined with the prohibitive computational cost of computing brute-force likelihood ratios for all models and potential users, makes complete identification impractical in practice.

Takeaways, Limitations

Takeaways:
It clearly presents theoretical and practical limitations in identifying the origin of LLM output.
The rapid expansion of the LLM ecosystem demonstrates that it amplifies the difficulty of identifying origins.
It highlights the need to explore a realistic approach to identifying LLM outputs.
Limitations:
Results may vary depending on the type and characteristics of the model used in the analysis.
It may not fully encompass the challenges of identifying LLM outputs in a real-world hostile environment.
Further research is needed to develop more sophisticated identification techniques.
👍