This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval
Created by
Haebom
Author
Meixiu Long, Duolin Sun, Dan Yang, Junjie Wang, Yue Shen, Jian Wang, Peng Wei, Jinjie Gu, Jiahai Wang
Outline
DIVER is a novel retrieval pipeline for inference-intensive information retrieval. It consists of four components: document preprocessing, including noise removal and long-document segmentation; iterative query expansion using a large-scale language model; retrieval using a model fine-tuned using synthetic medical and mathematical data and challenging negative examples; and a reranking step combining pointwise and listwise strategies. It outperforms existing inference-aware models on the BRIGHT benchmark, achieving an overall nDCG@10 score of 45.8 and a score of 28.9 for the original query. This demonstrates the effectiveness of inference-aware retrieval strategies on complex real-world tasks.
Takeaways, Limitations
•
Takeaways: We present an effective solution to inference-intensive information retrieval problems. We leverage large-scale language models and synthetic data to improve performance on complex real-world queries. We achieve state-of-the-art performance on the BRIGHT benchmark. We highlight the importance of inference-aware search strategies.
•
Limitations: The generalizability of the synthetic data used needs to be reviewed. Further evaluation with real-world data is needed. Because the study focused on specific domains (medical and mathematical), further research is needed to determine generalizability to other domains. Further analysis of the model's interpretability may be necessary.