This paper demonstrates that while AI-based weather forecasting models outperform conventional numerical weather forecasting systems, they still have limitations in predicting unprecedented extreme weather events. The European Centre for Medium-Range Weather Forecasts' High-Resolution Forecasting Model (HRES) consistently outperforms state-of-the-art AI models, including GraphCast, Pangu-Weather, and Fuxi, in predicting record-breaking extreme weather events. The AI models exhibit larger prediction errors for record-breaking heatwaves, cold waves, and strong winds than the HRES model, and their errors tend to increase with the number of record-breaking events. In particular, they tend to underestimate record-breaking heatwaves and overestimate record-breaking cold waves. Therefore, AI weather models have limitations in extrapolating beyond the training data domain and in predicting potentially impactful record-breaking weather events. More rigorous validation and development are needed before AI models can be used solely for high-risk applications such as early warning systems and disaster management.