This paper presents a vision-language framework that provides an intelligent and adaptive solution for adapting to diverse UI design changes in automotive infotainment systems. It facilitates understanding and interaction with automotive UIs, enabling seamless adaptation across diverse UI designs. To achieve this, we release the AutomotiveUI-Bench-4K open-source dataset, consisting of 998 images and 4,208 annotations, and present a data pipeline for training data generation. We fine-tune a Molmo-7B-based model using LoRa (Low-Rank Adaptation) and develop an Evaluative Large Action Model (ELAM) by integrating visual-based and evaluation functions. The developed ELAM achieves high performance on AutomotiveUI-Bench-4K, and in particular, outperforms the baseline model by 5.6% on the ScreenSpot task (average accuracy of 80.8%). It performs similarly or better than specialized models for desktop, mobile, and web platforms, and despite being primarily trained in the automotive domain, it demonstrates excellent domain generalization. This study presents a direction for AI-based advancements in automotive UI understanding and interaction through data collection and fine-tuning, providing a fine-tuned model that can be deployed on consumer-grade GPUs in a cost-effective manner.