Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PBa-LLM: Privacy- and Bias-aware NLP using Named-Entity Recognition (NER)

Created by
  • Haebom

Author

Gonzalo Mancera, Aythami Morales, Julian Fierrez, Ruben Tolosana, Alejandro Penna, Miguel Lopez-Duran, Francisco Jurado, Alvaro Ortigosa

Outline

In this paper, we propose a privacy-preserving learning framework using Named Entity Recognition (NER) technology to address privacy and ethical issues in high-risk AI applications based on large-scale language models (LLMs). In the recruitment process, we analyze 24,000 applicant profile data by applying six anonymization algorithms (based on Presidio, FLAIR, BERT, and GPT) to BERT and RoBERTa models using an AI-based resume evaluation system. The experimental results show that the proposed privacy-preserving technique is effective in ensuring the confidentiality of applicant information while maintaining the system performance, thereby enhancing the reliability. Furthermore, we propose a privacy- and bias-aware LLM (PBa-LLM) by applying the existing gender bias reduction technique, which suggests that it can be applied to other LLM-based AI applications in addition to resume evaluation systems.

Takeaways, Limitations

Takeaways:
We empirically demonstrate that a privacy-preserving learning framework leveraging NER technology is effective in addressing privacy and ethical issues in LLM.
Demonstrated that privacy protection is achievable without compromising system performance.
We present an improved PBa-LLM combined with a gender bias reduction technique.
The proposed framework has applicability to various LLM-based AI applications.
Limitations:
The evaluation of the proposed framework is limited to a specific system (AI-based resume assessment). Further research is needed on its generalizability to other high-stakes AI applications.
A more in-depth comparative analysis of the performance of the anonymization algorithms used is needed.
Further research is needed to determine the optimal balance between privacy protection and performance degradation.
Generalizability validation is needed for various types of sensitive information and datasets.
👍