Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

ConTextTab: A Semantics-Aware Tabular In-Context Learner

Created by
  • Haebom

Author

Marco Spinaci, Marek Polewczyk, Maximilian Schambach, Sam Thelin

Outline

In this paper, we present the ConTextTab model, which achieves state-of-the-art performance in context learning (ICL) for tabular data. Existing tabular ICL models are trained only with synthetic data, which fails to leverage the rich semantics and knowledge of real data, or are based on pre-trained large-scale language models, which limit the amount of context. ConTextTab addresses these issues by maintaining a structure suitable for the characteristics of tabular data, while providing embeddings specialized for various data types and training on large-scale real data. Experimental results demonstrate that ConTextTab demonstrates state-of-the-art performance on various benchmarks, and in particular, sets a new performance benchmark on the semantically rich CARTE benchmark. The source code and trained models are available on GitHub.

Takeaways, Limitations

Takeaways:
We achieved performance improvements by training the tabular data ICL model using real data.
It combines the ability to understand meaning while taking into account the structural features of tabular data.
It achieves state-of-the-art performance across a variety of benchmarks and sets a new performance benchmark on the CARTE benchmark.
We make our source code and trained models public to support reproducibility and further research.
Limitations:
There is no specific mention of __T54829_____ in the ConTextTab model presented in this paper. This remains to be elucidated through future research.
👍