Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Doc2SAR: A Synergistic Framework for High-Fidelity Extraction of Structure-Activity Relationships from Scientific Documents

Created by
  • Haebom

Author

Jiaxi Zhuang, Kangning Li, Jue Hou, Mingjun Xu, Zhifeng Gao, Hengxing Cai

Outline

We propose Doc2SAR, a novel framework for extracting molecular structure-activity relationships (SARs) from scientific papers and patents, leveraging the DocSAR-200 benchmark. By integrating domain-specific tools and supervised learning-enhanced MLLM, Doc2SAR achieves state-of-the-art performance across a wide range of document types, significantly outperforming existing end-to-end models.

Takeaways, Limitations

Doc2SAR overcomes the limitations of existing methods in SAR extraction tasks and achieves state-of-the-art performance.
We present an evaluation criterion for SAR extraction methodologies using the DocSAR-200 benchmark.
We have demonstrated its practicality through efficient inference and web apps.
The Limitations of this paper is not specifically stated.
👍