This paper highlights the challenges of achieving consistent performance across languages, especially when incorporating cultural knowledge, in real-world applications of multimodal large-scale language models (MLLMs). To assess this challenge, we present two new benchmarks: KnowRecall, a visual question answering benchmark that focuses on cultural and historical questions in 15 languages, and VisRecall, which evaluates visual memory consistency across nine languages describing landmark appearances without access to images. Experimental results show that even state-of-the-art MLLMs struggle to achieve cross-language consistency, highlighting the need for more robust approaches to generating truly multilingual and culturally aware models.