Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

TASER: Table Agents for Schema-guided Extraction and Recommendation

Created by
  • Haebom

Author

Nicole Cho, Kirsty Fielding, William Watson, Sumitra Ganesh, Manuela Veloso

Outline

This paper proposes TASER (Table Agents for Schema-guided Extraction and Recommendation), an agent-based system for extracting unstructured, multi-page table data from real-world financial documents. TASER transforms unstructured tables into regularized, schema-compliant output by utilizing agents that perform table detection, classification, extraction, and schema modification suggestions. Specifically, TASER incorporates schema improvements through continuous learning, emphasizes the effectiveness of large-scale batch learning, and achieves 10.1% performance improvement over existing models such as Table Transformer. Furthermore, we present a novel financial table dataset, TASERTab, which comprises 22,584 pages (28,150,449 tokens), 3,213 tables, and a total of $731,685,511,687 worth of asset data.

Takeaways, Limitations

Takeaways:
Provides an effective solution to the problem of extracting complex and unstructured table data from real-world financial documents.
Proving the Effectiveness of an Agent-Based, Schema-Guided Extraction System
Emphasize the importance of performance improvement and schema improvement through continuous learning.
Enabling research by releasing a large-scale dataset, TASERTab, including real-world financial data.
10.1% performance improvement over Table Transformer
Improved schema recommendations and increased asset extraction through large-scale batch learning (9.8%).
Limitations:
Currently available information is insufficient to provide a detailed description of the specific architecture and algorithms of the TASER system.
Further analysis of the quality and bias of the TASERTab dataset is needed.
Generalization performance evaluation is required for various types of financial documents and table structures.
Lack of comparative analysis with other agent-based systems.
👍