In this paper, we propose SPAR, a multi-agent framework that integrates RefChain-based query decomposition and query evolution to overcome the limitations of academic literature retrieval systems utilizing large-scale language models (LLMs). It improves the rigid pipeline and limited inference capability of existing systems, enabling more flexible and effective retrieval. In addition, we construct SPARBench, a demanding benchmark that includes expert-annotated relevance labels, to facilitate systematic evaluation. Experimental results show that SPAR outperforms existing state-of-the-art benchmark models, achieving up to +56% F1 improvement in AutoScholar and +23% F1 improvement in SPARBench. SPAR and SPARBench provide a foundation for the advancement of academic retrieval research with scalability, interpretability, and high performance.