Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

A Transformer Model for Predicting Chemical Products from Generic SMARTS Templates with Data Augmentation

Created by
  • Haebom

Author

Derin Ozer, Sylvain Lamprier, Thomas Cauchy, Nicolas Gutowski, Benoit Da Mota

Outline

To address the challenges of predicting chemical reaction outcomes in computational chemistry, this paper proposes the Broad Reaction Set (BRS), which contains 20 common reaction templates based on SMARTS, and ProPreT5, the first language model capable of handling these templates. ProPreT5 is a T5-based model that improves generalization performance through a novel augmentation strategy for SMARTS templates. Trained with augmented templates, ProPreT5 demonstrates stronger prediction performance and generalization to novel reactions than existing methods.

Takeaways, Limitations

Takeaways:
We present BRS, a set of general response templates based on SMARTS, to reduce dependence on specific responses.
Developed ProPreT5, the first language model capable of directly processing and applying SMARTS templates.
We present the first augmentation strategy for SMARTS to improve the generalization performance of the model.
Achieve improved response prediction performance and generalization performance compared to existing methods.
Limitations:
The 20 templates included in BRS may not cover all chemical reactions.
The performance of ProPreT5 may depend on the dataset used.
The effectiveness of SMARTS augmentation strategies may vary across different types of responses or datasets.
Further research may be needed to determine the interpretability of the model.
👍