Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks

Created by
  • Haebom

Author

Ping Yu, Jack Lanchantin, Tianlu Wang, Weizhe Yuan, Olga Golovneva, Ilia Kulikov, Sainbayar Sukhbaatar, Jason Weston, Jing Xu

Outline

This paper proposes CoT-Self-Instruct, a high-quality synthetic data generation method for large-scale language models (LLMs). Based on a seed task, CoT-Self-Instruct first infers and plans the LLM through Chain-of-Thought (CoT), and then generates new synthetic data of similar quality and complexity. This is followed by a filtering step that selects high-quality data using automatic evaluation metrics, and the selected data is used for LLM training. Experimental results show that CoT-Self-Instruct outperforms existing training datasets (s1k, OpenMathReasoning) on verifiable reasoning tasks (MATH500, AMC23, AIME24, GPQA-Diamond) and human-generated data and standard self-instruct training data on non-verifiable instruction-following tasks (AlpacaEval 2.0, Arena-Hard).

Takeaways, Limitations

Takeaways:
CoT-Self-Instruct can contribute to improving the performance of LLM by generating synthetic data with higher quality than existing datasets.
It shows excellent performance on both verifiable inference and unverifiable instruction following tasks.
We present a method for efficiently selecting high-quality data using automated evaluation metrics.
Limitations:
Further validation of the generalization performance and reliability of the proposed automatic evaluation metric is needed.
There is a lack of analysis of the bias and safety of the generated synthetic data.
Since performance was evaluated only for a specific type of task, generalization performance to other types of tasks needs to be examined.
👍