Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation

Created by
  • Haebom

Author

Md Toufique Hasan, Muhammad Waseem, Kai-Kristian Kemell, Ayman Asad Khan, Mika Saari, Pekka Abrahamsson

Outline

This paper presents five domain-specific Retrieval-Augmented Generation (RAG) applications developed based on real-world use cases across five domains: governance, cybersecurity, agriculture, industrial research, and medical diagnostics. Each system integrates multilingual OCR, semantic retrieval via vector embeddings, and domain-adapted LLMs, and is deployed via a local server or cloud API to meet user requirements. A web-based evaluation with 100 participants evaluated the systems across six dimensions: usability, relevance, transparency, responsiveness, accuracy, and recommendability. Based on user feedback and development experience, we documented 12 key lessons learned that highlight the technical, operational, and ethical challenges impacting the practical application of RAG systems. This paper aims to address the lack of empirical research on the development and evaluation of RAG systems based on real-world use cases.

Takeaways, Limitations

Takeaways:
Demonstrates the practical applicability of the RAG system in various real-world domains.
Provides insight into the performance and usability of the RAG system by providing real-world evaluation results through user participation.
Provides valuable lessons on the technical, operational, and ethical challenges that arise during the development and deployment of RAG systems.
Presents key technology elements required for building a RAG system, including multilingual OCR, vector embedding, and domain-adaptive LLM.
Limitations:
The number of users participating in the evaluation may be limited to 100.
Further research is needed to determine generalizability to domains other than the five presented.
Additional monitoring and evaluation of the long-term performance and stability of the developed RAG system is needed.
The possibility of performance degradation when a system optimized for a specific domain is applied to another domain.
👍