Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery

Created by
  • Haebom

Author

Jing Lan, Hexiao Ding, Hongzhao Chen, Yufeng Jiang, Nga-Chun Ng, Gerald WY Cheng, Zongxi Li, Jing Cai, Liang-ting Lin, Jung Sun Yoo

Outline

This paper presents a novel pretraining method that considers solvent-dependent structural changes and joint learning of multiple correlated tasks to improve the accuracy of protein-ligand interaction prediction. Using an ensemble of ligand structures generated under various solvent conditions as augmented input, we integrate structural flexibility and environmental context to learn. We integrate molecular reconstruction, interatomic distance prediction, and contrastive learning to build a solvent-invariant molecular representation. As a result, we demonstrate improved performance in binding affinity prediction (3.7% improvement), the PoseBusters Astex docking benchmark (82% success rate), and virtual screening (AUC 97.1%), achieving a root mean square deviation (RMSD) of 0.157 angstroms, providing insight into atomic-level binding mechanisms.

Takeaways, Limitations

Takeaways:
Improving the accuracy of protein-ligand interaction predictions by considering solvent effects.
Performance improvement through integrated learning of multiple related tasks (e.g., molecular reconstruction, interatomic distance prediction).
Providing docking results with atomic-level accuracy and insights into binding mechanisms.
Contributing to the advancement of structure-based drug design
Limitations:
Further studies are needed to determine the generality of the presented method and its applicability to various protein-ligand systems.
Need to evaluate the dependence on computational cost and training data size
The generalizability of the results to specific benchmark datasets needs to be verified.
👍