Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning

Created by
  • Haebom

Author

Shreyas Vinaya Sathyanarayana, Rahil Shah, Sharanabasava D. Hiremath, Rishikesh Panda, Rahul Jana, Riya Singh, Rida Irfan, Ashwin Murali, Bharath Ramsundar

Outline

In this paper, we present DeepRetro, an open-source iterative hybrid retrosynthetic framework that integrates existing template-based/Monte Carlo tree search tools with the ability to generate large-scale language models (LLMs) to solve the retrosynthesis problem essential for complex molecular synthesis. DeepRetro first attempts a synthetic plan with a template-based engine, and if it fails, the LLM proposes a single-step retrosynthetic disjunction. The proposed disjunction is then tested for validity, stability, and hallucination, and the resulting precursors are recursively fed back into the pipeline for further evaluation. This iterative improvement allows dynamic path exploration and modification. Through benchmark evaluations and case studies, we demonstrate its ability to identify feasible and novel retrosynthetic pathways for complex natural product compounds, and in particular, we demonstrate the potential of LLM inference by developing an interactive graphical user interface that allows human-loop feedback from expert chemists.

Takeaways, Limitations

Takeaways:
We present a novel approach to solving retrosynthetic problems by combining the generative power of LLM with the advantages of existing template-based methods.
An iterative, feedback-driven approach allows for dynamic path exploration and modification, suggesting the potential for more effective multi-step planning.
Demonstrates potential for discovering novel synthetic routes to complex natural product compounds.
Incorporating human-loop feedback from experts can improve the accuracy and efficiency of the algorithm.
It is provided as open source, so it is highly accessible and expandable.
Limitations:
Further research is needed to determine the effectiveness and accuracy of the validation process for solving LLM's hallucination problem.
Need to evaluate and improve generalization performance for various compounds and complexities.
Further research is needed on efficient interaction and information transfer mechanisms between template-based engines and LLMs.
Training and evaluation using large datasets are required, and performance can be greatly affected by the quality and quantity of data.
👍