Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

A Comparative Study of Specialized LLMs as Dense Retrievers

Created by
  • Haebom

Author

Hengran Zhang, Keping Bi, Jiafeng Guo

Outline

This paper systematically investigates the impact of domain specialization on retrieval efficiency when utilizing large-scale language models (LLMs) as dense searchers. As a crucial step toward developing a unified searcher capable of handling text, code, images, and multimodal content, we experimentally analyze how task-specific adaptation of LLMs impacts retrieval performance. We conduct extensive experiments using eight Qwen2.5 7B LLMs (baseline, directed tuning, code/math specialization, long-text inference, and vision-language models) in both zero-shot retrieval and supervised learning settings. In the zero-shot retrieval setting, we consider text retrieval in the BEIR benchmark and code retrieval in the CoIR benchmark. To evaluate supervised learning performance, all LLMs are fine-tuned on the MS MARCO dataset. Math specialization and long-text inference consistently degrade performance across all three settings, suggesting a tradeoff between mathematical inference and semantic matching. The vision-language model and code-specific LLM demonstrate superior zero-shot performance compared to other LLMs, outperforming BM25 on code retrieval tasks and maintaining comparable performance to the baseline LLM in supervised learning settings. These results suggest promising directions for integrated retrieval tasks leveraging cross-domain and cross-modal fusion.

Takeaways, Limitations

Takeaways:
Code-specific and vision-language model-based LLMs demonstrate superior performance in zero-shot retrieval. In particular, they outperform BM25 in code retrieval.
Presenting the possibility of developing an integrated search system utilizing cross-domain and cross-modal fusion.
Revealing a trade-off between mathematical reasoning ability and semantic matching.
Limitations:
A limited number of LLMs and datasets were used. Further research using a wider variety of LLMs and datasets is needed.
Further research is needed on settings other than zero-shot and supervised learning settings.
Further research is needed on the generalization performance of LLMs specialized for specific tasks.
👍