Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

A standard transformer and attention with linear biases for molecular conformer generation

Created by
  • Haebom

Author

Viatcheslav Gurev, Timothy Rumbell

Outline

This paper addresses the issue of low-energy molecular configuration (spatial arrangement of atoms in a molecule) sampling, which is a critical task for many computations in drug discovery and optimization. While many specialized isoveginal networks have been designed to generate molecular configurations from 2D molecular graphs, anisotropic transformer models have recently emerged as an alternative due to their scalability and improved generalization performance. However, there has been a concern that anisotropic models require large model sizes to compensate for the lack of isoveginal bias. In this paper, we show that appropriately chosen positional encoding effectively addresses this size limitation. When the standard transformer model incorporating relative positional encoding for molecular graphs is extended to 25 million parameters, it outperforms the state-of-the-art anisotropic baseline model with 64 million parameters on the GEOM-DRUGS benchmark. The relative positional encoding is implemented as a negative attention bias that linearly increases with the shortest path distance between graph nodes of various gradients, similar to the widely used ALiBi technique in NLP. This architecture has the potential to serve as a foundation for a new class of molecular configuration generation models.

Takeaways, Limitations

Takeaways: We demonstrate that an anisotropic transformer model using relative position encoding can generate low-energy molecular arrangements more efficiently than conventional isosceles networks and large anisotropic models. This can contribute to overcoming model size constraints and reducing computational costs. This study suggests a new direction for molecular generation models.
Limitations: The performance of the method presented in this paper is limited to the GEOM-DRUGS benchmark, and its generalization performance to other datasets or more complex molecular structures requires further study. In addition, there is a lack of detailed discussion on the optimization of parameters (e.g., gradients) of relative position encoding. Further verification of the robustness to various molecule sizes and types is needed.
👍