Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Attacks and Defenses Against LLM Fingerprinting

Created by
  • Haebom

Author

Kevin Kurian, Ethan Holland, Sean Oesch

Outline

This paper addresses the serious privacy and security risks of fingerprinting attacks on large language models (LLMs), which are increasingly used in sensitive environments. We present research on LLM fingerprinting from both an offensive and defensive perspective. An offensive methodology that automatically optimizes query selection using reinforcement learning achieves better fingerprinting accuracy with just three queries than randomly selecting three queries from the same pool. The defensive approach uses semantic-preserving output filtering via auxiliary LLMs to hide model identity while maintaining semantic integrity. The defensive approach reduces fingerprinting accuracy for the tested models while maintaining output quality. These contributions demonstrate the potential to enhance the functionality of fingerprinting tools while providing practical mitigation strategies against fingerprinting attacks.

Takeaways, Limitations

Takeaways:
Presentation of an efficient fingerprint attack methodology based on reinforcement learning (high accuracy achieved with only 3 queries).
An effective defense strategy is presented through output filtering that preserves meaning.
Practical contributions to improving fingerprint attack and defense technologies.
Limitations:
The effectiveness of the proposed defense strategy may be limited to specific models and query pools.
A comprehensive evaluation of various offensive and defensive strategies may be lacking.
Further research is needed on performance and generalizability in real-world environments.
👍