Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search

Created by
  • Haebom

Author

Ziqi Wang, Boqin Yuan

Outline

L-MARS is a system that reduces confusion and uncertainty in legal question answering through multi-agent reasoning and retrieval. Unlike single-pass Augmented Search Generation (RAG), L-MARS decomposes questions into subproblems, performs targeted searches across disparate sources (Serper web, local RAG, CourtListener case law), and utilizes judge agents to validate sufficiency, jurisdiction, and temporal validity before synthesizing answers. This iterative inference-retrieval-verification loop ensures consistency, filters out noisy evidence, and grounds answers in authoritative law. We evaluated L-MARS on LegalSearchQA, a new benchmark comprised of 200 state-of-the-art multiple-choice legal questions from 2025. The results demonstrate that L-MARS significantly improves factual accuracy, reduces uncertainty, and achieves higher preference scores for both human experts and LLM-based judges. This study demonstrates that multi-agent reasoning via agent search provides a scalable and reproducible blueprint for deploying LLM in high-stakes areas requiring accurate legal search and deliberation.

Takeaways, Limitations

Takeaways:
Improving the accuracy and reliability of legal question answering through multi-agent inference and agent search.
Providing a scalable and reproducible framework for applying LLM to high-risk areas (legal fields).
Effective information retrieval and evidence filtering using heterogeneous data sources.
High preference scores from both human experts and LLM-based judges.
Limitations:
The LegalSearchQA benchmark is relatively small (200 questions).
Lack of comparative analysis of L-MARS's performance with other legal question answering systems.
Application and performance verification of L-MARS in actual legal environments are required.
Lack of detailed explanation of the judge agent's judgment criteria and algorithm.
👍