Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

UniGenX: a unified generative foundation model that couples sequence, structure and function to accelerate scientific design across proteins, molecules and materials

Created by
  • Haebom

Author

Gongbo Zhang, Yanting Li, Renqian Luo, Pipi Hu, Yang Yang, Zeru Zhao, Lingbo Li, Guoqing Liu, Zun Wang, Ran Bi, Kaiyuan Gao, Liya Guo, Yu Xie, Chang Liu, Jia Zhang, Tian Xie, Robert Pinsler, Claudio Zeni, Ziheng Lu, Hongxia Hao, Yingce Wen-Bin Zhang, Zhijun Zeng, Yi Zhu, Li Dong, Xiuyuan Hu, Li Yuan, Lei Chen, Haiguang Liu, Tao Qin

Outline

UniGenX is an integrated generative framework that co-generates one-dimensional sequences and three-dimensional coordinates by directly targeting functions and properties across diverse domains, including proteins, molecules, and materials. To address the lack of direct targeting of features (Limitations) in existing generative models, independent optimization of discrete sequences and continuous coordinates, and insufficient modeling of conformational ensembles, it represents heterogeneous inputs as a mixed stream of symbolic and numeric tokens, provides global context through a decoder-specific autoregressive transformer, and generates numeric fields controlled by specific operation tokens through a conditional diffusion head. In addition to achieving new best-in-class performance on structure prediction tasks, it also demonstrates state-of-the-art or competitive performance for feature-aware generation in materials, chemistry, and biology. Specifically, in materials, it generated 436 crystal candidates satisfying three constraints (11 of which are novel compositions), in chemistry, it sets a new benchmark for five feature targets and stereoisomer ensemble generation in GEOM, and in biology, it improves the success rate of protein-guided fit modeling by more than 23 times. In conclusion, our experimental results demonstrate the benefits of discrete-continuous joint training and significant advances in predictive controllability and feature recognition generation through cross-domain transfer learning.

Takeaways, Limitations

Takeaways:
We present an integrated generative model that directly targets functions and properties in various fields such as proteins, molecules, and materials.
Addressing the lack of direct targeting for the __T4801__ function in existing models, independent optimization of discrete sequences and continuous coordinates, and insufficient modeling of morphological ensembles.
Achieve state-of-the-art or competitive performance in structural prediction and feature recognition generation tasks.
Significant performance improvements in materials, chemistry, and biology (e.g., >23x improvement in protein-induced fit modeling success rate).
Experimental demonstration of the effectiveness of discrete-continuous joint training.
Demonstrates the potential for cross-domain transfer learning.
Limitations:
The paper does not specifically mention Limitations. Future research is expected to improve the model's performance and expand its scope of application.
👍