Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Auxiliary Discrminator Sequence Generative Adversarial Networks (ADSeqGAN) for Few Sample Molecule Generation

Created by
  • Haebom

Author

Haocheng Tang, Jing Long, Beihong Ji, Junmei Wang

Outline

We propose a novel approach for molecule generation from small datasets, the Auxiliary Discriminant Sequence Generative Adversarial Network (ADSeqGAN). Existing generative models struggle with limited training data, particularly in the field of drug discovery, where molecular datasets for specific therapeutic targets, such as nucleic acid binders or central nervous system (CNS) drugs, are scarce. ADSeqGAN significantly improves the quality and class specificity of molecule generation by incorporating a random forest classifier as an additional discriminator into the GAN framework. This study further enhances training stability and diversity by incorporating a pretrained generator and the Wasserstein distance. We evaluated ADSeqGAN on three use cases: nucleic acid and protein target molecules, CNS drugs, and CB1 ligand design. It outperformed the baseline model in the generation of nucleic acid binders, and achieved higher yields than existing novel drug design models in the generation of CNS drugs through oversampling. In the CB1 ligand design, the predicted activity was 32.8%, as evaluated by the target-specific LRIP-SF score function, generating novel drug-like molecules that outperformed both CB1-focused and general-purpose libraries. Overall, ADSeqGAN provides a versatile framework for molecular design in data-poor scenarios, demonstrating applications in nucleic acid binding agents, central nervous system drugs, and CB1 ligands.

Takeaways, Limitations

Takeaways:
A novel method for effectively solving the problem of molecule generation in small datasets is presented.
Proven applicability in various fields such as nucleic acid binding agents, CNS drugs, and CB1 ligands
Improved performance and diversity compared to existing models
Improving CNS drug production efficiency through oversampling
Limitations:
Generalizability to other fields beyond the three cases presented needs to be verified.
The performance of ADSeqGAN may be affected by the performance of the random forest classifier.
There are limitations in performance evaluation due to the limitations of the LRIP-SF scoring function.
Lack of comparative analysis with other methods besides the pre-trained generator and Wasserstein distance used.
👍