Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

SoK: Large Language Model Copyright Auditing via Fingerprinting

Created by
  • Haebom

Author

Shuo Shao, Yiming Li, Yu He, Hongwei Yao, Wenyuan Yang, Dacheng Tao, Zhan Qin

Outline

This paper presents the first comprehensive study of LLM fingerprinting techniques to address copyright infringement issues in large-scale language models (LLMs). LLM fingerprinting is a non-invasive technique for identifying copyright infringement by extracting unique features from LLMs. We present a unified framework and a formal taxonomy that categorizes existing methods into white-box and black-box approaches. We then propose LeaFBench, the first systematic benchmark for evaluating LLM fingerprinting techniques in realistic deployment scenarios. LeaFBench is based on a core foundation model and integrates 149 different model instances and 13 representative post-development techniques, including parameter-tuning methods and parameter-independent mechanisms. Extensive experiments using LeaFBench reveal the strengths and weaknesses of existing methods, highlighting future research directions and important open issues in this emerging field. The code is available at https://github.com/shaoshuo-ss/LeaFBench .

Takeaways, Limitations

Takeaways:
We present the first comprehensive study of LLM fingerprint technology, providing a new solution to prevent copyright infringement.
We provide an integrated framework and classification scheme that systematically classifies LLM fingerprinting techniques.
We present LeaFBench, the first systematic benchmark that considers realistic deployment scenarios, to evaluate the performance of existing methods and suggest future research directions.
It demonstrates efforts to address the reliability issues of LLM fingerprint technology due to the lack of various model modifications and evaluation standards.
Limitations:
LeaFBench may not cover all possible model modification and deployment scenarios.
As new LLM fingerprinting technologies emerge, continuous updates and maintenance of LeaFBench are required.
Further discussion of legal and ethical issues is needed for the effective application of LLM fingerprinting technology.
👍