Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Bidirectional Hierarchical Protein Multi-Modal Representation Learning

Created by
  • Haebom

Author

Xuefeng Liu, Songhao Jiang, Chih-chan Tien, Jinbo Xu, Rick Stevens

Outline

This paper proposes a multimodal protein representation learning framework that leverages both protein sequence and 3D structural information. It combines the strengths of a Transformer-based protein language model (pLM), pre-trained on large-scale protein sequence data, and a graph neural network (GNN) that leverages 3D structural information. This framework enables effective information exchange between both modalities through attention and gating mechanisms. Specifically, a bi-hierarchical fusion approach enhances the integration of sequence and structural information at both local and global levels. The proposed method outperforms existing methods on various protein representation learning benchmarks, including enzyme EC classification, model quality assessment, protein-ligand binding affinity prediction, protein-protein binding site prediction, and B-cell epitope prediction, achieving a new state-of-the-art in the field of multimodal protein representation learning.

Takeaways, Limitations

Takeaways:
A novel Bi-Hierarchical Fusion framework for effective fusion of protein sequence and structure information is presented.
Improved performance over existing methods in various protein-related prediction tasks.
Achieving a new state-of-the-art in multimodal protein representation learning.
Information exchange and mutual reinforcement between modalities through attention mechanisms and gating mechanisms.
Limitations:
This paper does not specifically address Limitations. Future research may require evaluating generalization performance and optimizing computational costs on various protein structure datasets.
👍