This paper presents a system that analyzes a large-scale database of tens of millions of images to discover temporal patterns of change. It aims to identify coexisting changes (trends) across a city by leveraging images captured at various points in time. Unlike conventional visual analytics, this system can answer open-ended questions (e.g., "What types of changes frequently occur in a city?") without a predefined target topic or learning labels. Because of this, existing learning-based or unsupervised visual analytics tools are not suitable. Therefore, we utilize a multimodal large-scale language model (MLLM), which possesses open semantic understanding capabilities, as a novel tool. However, because the dataset size far exceeds the processing capabilities of MLLM, we introduce a bottom-up procedure that decomposes the large-scale visual analytics problem into smaller, manageable subproblems. For each subproblem, we design an MLLM-based solution. Through experiments and ablation studies, we demonstrate that it outperforms existing methods and can identify interesting trends (e.g., "outdoor restaurants added," "overpasses painted blue") in metropolitan imagery.