Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Understanding Fairness-Accuracy Trade-offs in Machine Learning Models: Does Promoting Fairness Undermine Performance?

Created by
  • Haebom

Author

Junhua Liu, Roy Ka-Wei Lee, Kwan Hui Lim

Outline

This paper compares and analyzes the fairness between machine learning (ML) models and human evaluators using data from 870 college admissions applicants. Predictions were made using three ML models: XGB, Bi-LSTM, and KNN, along with BERT embeddings. The human evaluators were comprised of experts from diverse backgrounds. To assess individual fairness, we introduced a consistency metric that measures the agreement between the ML models and human evaluators' decisions. The analysis results showed that the ML models outperformed the human evaluators by 14.08% to 18.79% in fairness consistency. This demonstrates the potential for leveraging ML to improve fairness in the admissions process while maintaining high accuracy, and we propose a hybrid approach that combines human judgment and ML models.

Takeaways, Limitations

Takeaways:
We demonstrate that ML models can make fairer decisions in the admissions process than human evaluators.
A hybrid approach that combines human judgment and ML models can improve the fairness of the admissions process.
We present a new evaluation metric (consistency metric) to improve the fairness of ML models.
Limitations:
The dataset used is limited to applicant data for admission to a specific university, which may limit generalizability.
It is possible that we did not comprehensively consider all types of bias (algorithmic, data-driven, cognitive, subjective, etc.).
In addition to consistency metrics, other fairness evaluation metrics need to be considered.
The types of ML models used are limited, and results may vary when using different models.
👍