Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

Created by
  • Haebom

Author

Kaisen Yang, Lixuan He, Rushi Shah, Kaicheng Yang, Qinwei Ma, Dianbo Liu, Alex Lamb

Explore-Execute Chain ($E^2C$)

Outline

$E^2C$ is a framework designed to improve the inference capability of large-scale language models (LLMs). It separates inference into an exploration phase, which generates high-level strategic plans, and an execution phase, which executes the selected plan. This framework uses a dual-stage training approach that combines supervised fine-tuning (SFT) and reinforcement learning (RL), and integrates a novel data generation algorithm into SFT that enhances plan adherence. $E^2C$ achieves high accuracy using fewer tokens than other methods on AIME'2024, and outperforms SFT on medical benchmarks by enhancing cross-domain adaptability through exploration-focused SFT (EF-SFT).

Takeaways, Limitations

Takeaways:
Improved computational efficiency by separating planning and execution.
Improved inference path exploration and generalization capabilities through the use of exploration phases.
Achieving higher accuracy than SFT on medical benchmarks and improving cross-domain adaptability.
Increased interpretability.
Improved test time efficiency (reduced token usage compared to Forest-of-Thought).
Limitations:
There is no Limitations specified in the paper.
👍