[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

DRAGON: Dynamic RAG Benchmark On News

Created by
  • Haebom

Author

Fedor Chernogorskii, Sergei Averkiev, Liliya Kudraleeva, Zaven Martirosian, Maria Tikhonova, Valentin Malykh, Alena Fenogenova

Outline

In this paper, we present DRAGON (Dynamic RAG Benchmark On News), the first dynamic RAG (Retrieval-Augmented Generation) benchmark for the Russian language. DRAGON is based on a regularly updated corpus of Russian news and public documents, and provides a comprehensive evaluation of both retrieval and generation components. It automatically generates questions using a knowledge graph generated from the corpus, and extracts four core question types based on subgraph patterns. We publish a complete evaluation framework, including an automatic question generation pipeline, evaluation scripts (reusable across languages and multilingual environments), and benchmark data, along with a public leaderboard to encourage community participation and comparison. It overcomes the limitations of existing English-centric static RAG benchmarks and provides a resource for evaluating Russian RAG systems that reflects the dynamic nature of real-world environments.

Takeaways, Limitations

Takeaways:
Provides the first dynamic benchmark for evaluating Russian RAG systems
Reflects real-world environments based on regularly updated news corpus
Comprehensive evaluation support for search and generation components
Providing reusability and extensibility through the publication of automatic question generation pipelines and assessment scripts.
Encourage community engagement and comparison through public leaderboards
Limitations:
Currently focused on Russian only, further research is needed on extending to other languages
There is a possibility that the news corpus contains biased data.
Additional validation of the accuracy and diversity of automatic question generation is needed.
Plans for ongoing maintenance and updates of DRAGON are needed.
👍