This paper focuses on the effective modeling of multimodal longitudinal data, which is an important task in various application areas, especially in biomedicine. Pointing out the limitations of previous studies that do not sufficiently consider multimodality, we develop several configurations of Longitudinal Ensemble Integration (LEI), a novel multimodal longitudinal learning framework for sequential classification. We evaluate the performance of LEI and compare it with existing methods on the task of early diagnosis of dementia, and demonstrate that it outperforms existing methods by improving integration over time by utilizing intermediate baseline predictions generated from individual data modalities. In addition, it is designed to identify features that are consistently important for dementia-related diagnosis prediction. In conclusion, this study demonstrates the potential of LEI for sequential classification from multimodal longitudinal data.