Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

EDBench: Large-Scale Electron Density Data for Molecular Modeling

Created by
  • Haebom

Author

Hongxin Xiang, Ke Li, Mingquan Liu, Zhixiang Cheng, Bin Yao, Wenjie Du, Jun Xia, Li Zeng, Xin Jin, Xiangxiang Zeng

Outline

This paper introduces EDBench, a large-scale, high-quality electron density dataset, to address the issue of overlooking the importance of electron density (ED) in existing molecular machine learning force fields (MLFFs). Based on PCQM4Mv2, EDBench provides accurate ED data for 3.3 million molecules and evaluates the model's ability to utilize electron density information through various ED-centric benchmark tasks, including prediction, search, and generation. The evaluation results demonstrate that learning-based methods utilizing EDBench can efficiently compute ED with comparable accuracy while significantly reducing computational costs compared to conventional DFT calculations. The EDBench data and benchmarks are freely available, and are expected to contribute to ED-based drug discovery and materials science research.

Takeaways, Limitations

Takeaways:
Building a large-scale, high-quality electron density dataset, EDBench, opens new possibilities for studying MLFFs.
We demonstrate that an electron density-based machine learning model can calculate electron density much more efficiently than DFT.
Provides an important foundation for ED-based drug development and materials science research.
Model performance can be evaluated and improved through various ED-centric benchmark tasks.
Limitations:
Further review of EDBench's data size and diversity is needed.
Further research is needed to evaluate the generalization performance of EDBench for various molecular systems.
The accuracy and efficiency of learning-based methods can depend heavily on the quality of the dataset.
👍