This paper proposes MMoE, a multimodal network for spoiler detection on online movie review websites. Unlike existing methods that focus solely on the textual content of reviews, MMoE leverages multimodal information by extracting graph, text, and meta features from the user-movie network, the textual content of reviews, and their metadata. To handle genre-specific spoiler language, MMoE adopts a Mixture-of-Experts architecture to enhance robustness, and an expert fusion layer integrates features from different perspectives for prediction. Experimental results demonstrate that MMoE outperforms the state-of-the-art methods by 2.56% and 8.41% in accuracy and F1 score, respectively, on two widely used spoiler detection datasets, demonstrating superior robustness and generalization performance. The code is available on GitHub.