This paper builds on the finding that large-scale language models (LLMs), which have demonstrated outstanding performance on various unstructured text understanding tasks, can also perform tabular (structured) understanding tasks without special training. We tested LLMs on various domains, including Wikipedia-based WTQ, financial TAT-QA, and scientific SCITAB, under various augmentations and perturbations. We investigated the effects of in-context learning (ICL), model size, directive tuning, and domain bias on the robustness of tabular question answering (TQA). While directive tuning and larger, more modern LLMs yield stronger and more robust TQA performance, data contamination and reliability issues remain, particularly in WTQ. In-depth attention analysis revealed a strong correlation between changes in attention distribution due to perturbations and performance degradation, with sensitivity peaking in the intermediate layers of the model. This highlights the need for developing structure-aware self-attention mechanisms and domain-adaptive processing techniques to improve the transparency, generalization, and real-world reliability of LLMs for tabular data.