Chain-of-Thought (CoT)

Jason Wei, who previously defined Zero-shot, introduced a method called Chain-of-Thought in 2022. As you can tell from the name, it's connected to chain prompting. The main point of difference is that it deliberately adds intermediate reasoning steps to achieve better results. This is especially effective for complex tasks that require a detailed thinking process.

ChainofThought Prompting Elicits Reasoning in Large Language Models.pdf870.87KB

Advantages of CoT Prompting

•

Multi-stage problem breakdown: CoT lets the model break down a multi-step problem into intermediate steps, so it can assign more computation to problems that need extra reasoning steps.

•

Interpretability of model behavior: CoT offers an interpretable window into how the model came to a certain answer, giving a chance to debug where an inference path may have gone wrong.

•

Applicability to various tasks: CoT reasoning can be used for tasks like math problems, common-sense reasoning, symbolic manipulation, and, in principle, any task that humans can solve using language.

•

Easy induction in large language models: With sufficiently large language models, simply including CoT sequence examples is enough to easily induce CoT reasoning.

Shall we look at an example? This is the example shown in the argument section earlier.

Prompt : 242,343,232,12,1,423,3,2,433,42,21,51,26 를 홀수와 짝수로 구별한 다음
홀수 인 숫자를 모두 더하시오. 이를 순서대로 진행해 주세요.

Let's separate the given numbers into odd and even, and add the odd ones in order:

Odd: 343, 1, 423, 3, 433, 21, 51
Sum of odd numbers: 343 + 1 + 423 + 3 + 433 + 21 + 51 = 1275

So, if you add up all the odd numbers among the given numbers, you get 1275.

Here, the instruction to “separate into odd and even numbers, then add up all the odd numbers in order” is what divides the work into a chain. When this method first came into the spotlight, it was said that requesting the process to be done ‘step by step’ or ‘in order’ led to better results.

Of course, CoT also has clear limitations. Simply put, it only works well with models that have learned a lot of data. In other words, on smaller models with fewer parameters (like SLMs), CoT doesn't perform well at all. In these cases, Few-shot or One-shot methods actually give better results.

Limitations of CoT

•

Dependent on model size: CoT Prompting generally shows positive performance improvements only in large-scale models (about 70B parameters). In smaller models, CoT is either ineffective or may even perform worse than standard prompting.

•

Limitations: Although CoT imitates the reasoning process of human thinkers, it remains uncertain whether the neural network is truly “reasoning.” Also, while the manual effort to expand examples to CoT might be small, the annotation cost for fine-tuning can dramatically increase. CoT doesn’t guarantee the correct inference path, and the high cost of running large models in practical applications should not be overlooked.

Still, CoT Prompting is an effective way to boost reasoning abilities in a variety of language model-based tasks. In fact, since the models served to our users nowadays are typically around 100B parameters, this approach works meaningfully well.

This may be used for commercial purposes with the copyright holder’s permission, as long as the source is cited.

Made with Slashpage