Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

HSFN: Hierarchical Selection for Fake News Detection building Heterogeneous Ensemble

Created by
  • Haebom

Author

Sara B. Coutinho, Rafael MO Cruz, Francimaria RS Nascimento, George DC Cavalcanti

Outline

This paper focuses on machine learning-based fact-checking systems, specifically ensemble methods that combine diverse classifiers, to address psychological biases (such as confirmation bias) that make them vulnerable to the spread of fake news on social media. The performance of existing ensemble methods relies heavily on the diversity of the constituent classifiers, but their tendency to learn overlapping patterns makes it difficult to select models with true diversity. To address this, we propose HierarchySelect, a novel automatic classifier selection method that prioritizes diversity among classifiers and considers performance. HierarchySelect calculates pairwise diversity between classifiers and applies hierarchical clustering to group them into different levels of granularity. It selects a pool of classifiers exhibiting different diversity at each level and selects the most diverse pool to form an ensemble. By incorporating evaluation metrics that reflect the performance of each classifier, it also ensures the generalization performance of the ensemble. We validate the performance of our method by comparing it to existing methods through experiments using six diverse datasets and 40 heterogeneous classifiers.

Takeaways, Limitations

Takeaways:
We present a novel automatic classifier selection method that contributes to solving the problem of fake news on social media.
Possibility of improving ensemble performance by selecting diversity-oriented classifiers based on hierarchical clustering.
Validation of the method through experiments using various datasets and classifiers.
Improving accessibility through open source code disclosure.
Limitations:
The proposed method does not guarantee the best performance on all datasets (it achieves the best performance on 2 out of 6 datasets).
Further research is needed to optimally balance diversity and performance.
Further experiments are needed on a wider variety of fake news and datasets.
👍