Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Retrieval-augmented reasoning with lean language models

Created by
  • Haebom

Author

Ryan Sze-Yin Chan, Federico Nanni, Tomas Lazauskas, Rosie Wood, Penelope Yong, Lionel Tarassenko, Mark Girolami, James Geddes, Andrew Duncan

Outline

This paper presents a novel approach for building efficient and privacy-preserving inference and augmented retrieval generation (RAG) systems even in resource-constrained and secure environments. Unlike existing RAG systems that rely on large-scale models and external APIs, this study leverages recent advances in test-time scaling and small-scale inference models to develop a search-augmented conversational agent capable of interpreting complex, domain-specific queries using a lightweight backbone model. It integrates dense retrieval with a fine-tuned Qwen2.5-Instruct model, and utilizes synthetic query generation and inference tracking derived from state-of-the-art models (e.g., DeepSeek-R1) on curated corpora such as the NHS A-to-Z disease pages. We investigate the impact of summary-based document compression, synthetic data design, and inference-aware fine-tuning. Evaluations on non-inference and general-purpose lightweight models demonstrate that the domain-specific fine-tuning approach significantly improves answer accuracy and consistency, achieving close to state-of-the-art performance while enabling local deployment. All implementation details and code are publicly available, supporting reproducibility and cross-domain applicability.

Takeaways, Limitations

Takeaways:
Presenting the possibility of implementing an efficient and privacy-preserving RAG system even in resource-constrained environments.
Interpret complex domain-specific queries using lightweight models.
Improved answer accuracy and consistency through domain-specific fine-tuning.
Close to cutting-edge performance, yet locally deployable.
Ensuring reproducibility and ease of cross-domain applicability through code disclosure.
Limitations:
As the results are for a specific domain, the NHS A-to-Z disease pages, further research is needed to determine generalizability.
Lack of detailed description of the quality and design of the synthetic data used.
Further experiments are needed to determine scalability and generalization performance to other domains.
A more detailed description of the metrics and evaluation methods used in performance comparisons with state-of-the-art models is needed.
👍