This paper discusses the progress in electroencephalography (EEG)-based multimodal emotion recognition (EMER) and aims to address three major challenges facing this field: the lack of open-source implementations, the lack of standardized benchmarks, and the lack of in-depth discussion of key challenges and promising research directions. To this end, we develop a unified evaluation framework, LibEMER, which provides fully reproducible PyTorch implementations of deep learning methods and presents standardized protocols for data preprocessing, model implementation, and experimental setup. This framework enables fair performance evaluation on three widely used public datasets for two learning tasks.