This paper presents a novel framework that improves the existing multimodal ENSO forecasting (MEF) model to improve the prediction of the El Niño Southern Oscillation (ENSO) phenomenon, a challenging long-term phenomenon. The existing MEF model utilizes an ensemble of 80 forecasts from two independent deep learning modules (a 3D-CNN and a time series module), but prioritizes module selection based on global performance without individual weighting or evaluation of ensemble members. In this study, we directly model the similarity among the 80 ensemble members using graph-based analysis to cluster structurally similar and accurate forecasts. We then use a community detection technique to select an optimized subset of 20 members. The final forecast is obtained by averaging this optimized subset. This method improves forecast performance by removing noise and emphasizing ensemble consistency, resulting in more stable and consistent results, especially in complex long-term forecasting situations. Furthermore, because it is model-independent, our approach can be applied to other forecasting models.