Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Benchmarking Pretrained Molecular Embedding Models For Molecular Representation Learning

Created by
  • Haebom

Author

Mateusz Praski, Jakub Adamczyk, Wojciech Czech

Outline

This paper compares and analyzes 25 pre-trained neural network models widely used in chemical and small molecule drug design using 25 datasets. Models with various modalities, architectures, and pre-training strategies were evaluated within a fair comparative framework. Using a hierarchical Bayesian statistical test model, the analysis revealed that almost all neural network models did not significantly improve performance compared to the baseline ECFP molecular fingerprint model. Only the CLAMP model, a molecular fingerprint-based model, showed statistically significant performance improvements over the other models. These results raise concerns about the rigor of previous studies, and we discuss their causes, solutions, and practical recommendations.

Takeaways, Limitations

Takeaways: Raises concerns about the rigor of previous studies evaluating the performance of pre-trained neural network models and suggests ways to improve them. Reaffirms the effectiveness of molecular fingerprinting-based models. Provides practical recommendations for future research.
Limitations: The 25 models and 25 datasets evaluated in this study may not perfectly represent all possibilities. Results may vary if other evaluation metrics or datasets are used. Further research is needed to determine the generalizability of model performance to specific types of molecules or specific chemical properties.
👍