Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation

Created by
  • Haebom

Author

Feiyang Cai, Jiahui Bai, Tao Tang, Guijuan He, Joshua Luo, Tianyu Zhu, Srikanth Pilla, Gang Li, Ling Liu, Feng Luo

Outline

MolLangBench is a comprehensive benchmark designed to evaluate molecular-language interface tasks, such as molecular structure recognition, editing, and generation using language prompts. To ensure accurate, clear, and deterministic output, recognition tasks were constructed using automated cheminformatics tools, and editing and generation tasks were curated through rigorous expert annotation and validation. MolLangBench supports a variety of molecular representations and language interfaces, including linear strings, molecular images, and molecular graphs. The state-of-the-art model (GPT-5) achieved 86.2% and 85.5% accuracy for the recognition and editing tasks, respectively, but only 43.0% accuracy for the generation task, limiting its performance.

Takeaways, Limitations

Takeaways:
MolLangBench provides a standardized benchmark for evaluating the performance of AI systems on molecular-language interface tasks.
Current AI systems demonstrate significant limitations even in tasks that are intuitive to humans: molecular recognition and manipulation.
It can facilitate research into more effective and reliable AI systems in the field of chemistry.
Limitations:
The best-performing model (GPT-5) achieved accuracy of 86.2% and 85.5% for the recognition and editing tasks, respectively, while showing a lower accuracy of 43.0% for the generation task.
Current AI systems struggle even with basic molecular recognition and manipulation tasks.
👍