Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model

Created by
  • Haebom

Author

Changze Lv, Jiang Zhou, Siyu Long, Lihao Wang, Jiangtao Feng, Dongyu Wei-Ying Ma, Bowen Zhou, Hao Zhou

Outline

We developed a robust protein-based model, AMix-1, based on Bayesian Flow Networks. It was built through a systematic training methodology that included pre-trained scaling laws, latent ability analysis, context-based learning mechanisms, and a test-time scaling algorithm. By establishing predictive scaling laws to ensure robust scalability and revealing the gradual emergence of structural understanding through a loss perspective, we created a robust 1.7 billion-parameter model. We designed a context-based learning strategy based on multiple sequence alignments (MSAs) to integrate protein design into a general framework. AMix-1 recognizes profound evolutionary cues between MSAs and consistently generates structurally and functionally consistent proteins. This framework enabled the design of AmeR variants with up to 50-fold improvement over the wild type. Furthermore, AMix-1 was enhanced with a scaling algorithm during evolutionary testing for in silico directed evolution, providing significant and scalable performance improvements as validation budgets increase, laying the foundation for next-generation in-lab protein design.

Takeaways, Limitations

Takeaways:
Successful development of a powerful protein-based model AMix-1 based on Bayesian Flow Networks.
Building a powerful 1.7 billion-parameter model through a systematic training methodology.
Building a protein design framework using MSA-based context-based learning strategies.
A successful protein design example is presented that improves the activity of an AmeR variant by up to 50-fold.
Introducing the possibility of in silico guided evolvability through scaling algorithms in evolutionary testing.
Laying the foundation for next-generation laboratory protein design.
Limitations:
Lack of specific performance metrics for the AMix-1 model and performance comparison with comparable models.
Lack of discussion on the generalizability and limitations of MSA-based context-based learning strategies.
Lack of analysis of the computational cost and efficiency of scaling algorithms in evolutionary testing.
Lack or limited presentation of experimental validation results.
👍