This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle
Created by
Haebom
Author
Hui Dai, Ryan Teehan, Mengye Ren
Outline
In this paper, we propose a continuous evaluation method for predicting future events based on daily news to solve __T4768__ of large-scale language model (LLM) evaluation benchmarks. We evaluate the temporal generalization and predictive ability of LLM using automatically generated question-answer (QA) pairs on the benchmark called 'Daily Oracle'. Our results show that the performance of LLM deteriorates as the pre-training data gets older, and the degradation persists even when augmented retrieval generation (RAG) is used, emphasizing the need for continuous model updating. The code and data can be found in __T4767_____ .