Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning

Created by
  • Haebom

Author

Chuzhan Hao, Wenfeng Feng, Yuewei Zhang, Hao Wang

Outline

Multi-stage agent search systems based on large-scale language models (LLMs) have demonstrated outstanding performance in complex information retrieval tasks, but they suffer from generating inconsistent intermediate queries and inefficient search trajectories. To address these challenges, this paper proposes DynaSearcher, an innovative search agent that leverages dynamic knowledge graphs and multi-reward reinforcement learning (RL). DynaSearcher guides the search process by explicitly modeling entity relationships using the knowledge graph as external structured knowledge, ensuring factual consistency of intermediate queries and mitigating bias caused by irrelevant information. Furthermore, a multi-reward RL framework allows for fine-grained control over training objectives such as search accuracy, efficiency, and response quality. Experimental results demonstrate that DynaSearcher achieves state-of-the-art answer accuracy on six multi-hop question-answering datasets, rivaling state-of-the-art LLMs with a small model and limited computational resources. Furthermore, it demonstrates strong generalization and robustness across diverse search environments and larger models, demonstrating its broad applicability.

Takeaways, Limitations

Takeaways:
Leverage knowledge graphs to maintain factual consistency and reduce unnecessary bias.
Improving search efficiency, accuracy, and response quality through a multi-reward RL framework.
Achieving cutting-edge performance with small models and limited resources
Demonstrated strong generalization ability across diverse environments and models
Limitations:
No specific mention of Limitations in the paper (not included in the Abstract)
👍