This paper evaluates the predictive power of state-of-the-art large-scale language models (LLMs). Using 464 prediction questions from Metaculus, we compared the performance of LLMs with that of leading predictors and expert groups. The results show that while state-of-the-art models achieve better Brier scores than human groups, they still lag significantly behind expert groups. While LLMs were not able to approach human accuracy until last year, recent models demonstrate significant progress.