Previous multimodal fake news detection studies mainly focused on the alignment and integration of cross-modal features and the application of text-image consistency. However, they overlooked the semantic enhancement effect of large-scale multimodal models and paid little attention to the emotional features of news. Inspired by the fact that fake news is more likely to contain negative sentiments than genuine news, in this paper, we propose a novel semantic enhancement and sentiment inference (SEER) network for multimodal fake news detection. It generates summarized captions for image semantic understanding and enhances the meaning using the results of large-scale multimodal models. Focusing on the relationship between the authenticity of news and emotional tendencies, we propose an expert sentiment inference module that optimizes the emotional features and infers the authenticity of news by simulating real-world scenarios. Through extensive experiments on two real-world datasets, we demonstrate that SEER outperforms the state-of-the-art baseline models.