Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Exploring the Robustness of Language Models for Tabular Question Answering via Attention Analysis

Created by
  • Haebom

Author

Kushal Raj Bhandari, Sixue Xing, Soham Dan, Jianxi Gao

Outline

This paper builds on the finding that large-scale language models (LLMs), which have demonstrated outstanding performance on various unstructured text understanding tasks, can also perform tabular (structured) understanding tasks without special training. We tested LLMs on various domains, including Wikipedia-based WTQ, financial TAT-QA, and scientific SCITAB, under various augmentations and perturbations. We investigated the effects of in-context learning (ICL), model size, directive tuning, and domain bias on the robustness of tabular question answering (TQA). While directive tuning and larger, more modern LLMs yield stronger and more robust TQA performance, data contamination and reliability issues remain, particularly in WTQ. In-depth attention analysis revealed a strong correlation between changes in attention distribution due to perturbations and performance degradation, with sensitivity peaking in the intermediate layers of the model. This highlights the need for developing structure-aware self-attention mechanisms and domain-adaptive processing techniques to improve the transparency, generalization, and real-world reliability of LLMs for tabular data.

Takeaways, Limitations

Takeaways:
Directive tuning and larger, more recent LLMs contribute to improved tabular question answering (TQA) performance and increased robustness.
We reveal a strong correlation between changes in attention distribution due to perturbations and performance degradation, with the highest sensitivity occurring in the model's intermediate layers.
The need for developing structure-aware self-attention mechanisms and domain-adaptive processing techniques is presented.
Limitations:
Data contamination and reliability issues still exist in some datasets, including WTQ.
The need to improve the reliability of LLM through more advanced interpretable methodologies is raised.
👍