Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments

Created by
  • Haebom

Author

Hanqun Cao, Xinyi Zhou, Zijun Gao, Chenyu Wang, Xin Gao, Zhi Zhang, Chunbin Gu, Ge Liu, Pheng-Ann Heng

Outline

In this paper, we propose a novel MSA (Multiple Sequence Alignment) design model, PLAME, to improve the structure prediction accuracy of low-similarity proteins and orphan proteins. Unlike existing methods, PLAME enhances the evolutionary information by utilizing the evolutionary embedding of pre-trained protein language models, and improves the generation quality through the conservation-diversity loss function. In addition, we propose a new MSA selection method that effectively selects high-quality MSAs and a new sequence quality assessment metric to evaluate MSA quality. On the AlphaFold2 benchmark for low-similarity and orphan proteins, PLAME achieves state-of-the-art performance with consistent performance improvements even in AlphaFold3. We verify the effectiveness of the MSA selection method through ablation studies, and provide insights into the relationship between the prediction quality of AlphaFold and MSA properties through extensive case studies on various protein types. Finally, we show that PLAME can serve as an adapter to achieve AlphaFold2-level accuracy at the inference speed of ESMFold.

Takeaways, Limitations

Takeaways:
Contributes to improving protein structure prediction accuracy for low-similarity and orphan proteins
A novel approach to MSA design utilizing pre-trained protein language models
Presenting an effective MSA selection and quality assessment method
Possibility of achieving AlphaFold2-level accuracy at ESMFold speed
Limitations:
PLAME's performance improvements may be limited to specific benchmark datasets.
Further research is needed on the generalizability of the proposed MSA selection and quality assessment method.
Further validation of PLAME's utility in real-world applications is needed.
👍