Inducing Directional Motivation: Techniques for Reaching Goals More Effectively

Directional Stimulus Prompting (DSP) is a method proposed by Zekun Li's team in their 2023 paper, <Guiding Large Language Models via Directional Stimulus Prompting>.

Guiding Large Language Models via Directional Stimulus Prompting.pdf1.98MB

Directional stimulus prompting is a novel framework that guides large language models (LLMs) toward a specific desired result. Instead of directly fine-tuning the LLM, it uses a small, tunable policy model (such as T5) to generate additional directional prompts for each input instance. These prompts act as fine-grained hints and cues, guiding the LLM to produce the desired output for each case.

Practical example

•

Input text: Bob Barker returned to "The Price Is Right" after eight years away.

•

Reference summary: Bob Barker hosted "The Price Is Right" on Wednesday. He retired as host in 2007.

Directional stimulus prompt: Question: Based on the article above, briefly summarize it in 2-3 sentences using the hints. Hint: Bob Barker; TV; April 1; "The Price Is Right"; 2007; 91.

•

Model output: On April 1, Bob Barker returned to the TV show "The Price Is Right" after eight years. At 91, Barker energetically hosted the first price guessing game. He stepped down as host in 2007 but hasn't missed a beat since returning.

Where is this used?

This method first uses supervised fine-tuning (SFT) to optimize the policy model with labeled data, then further refines it through reinforcement learning (RL) by giving online or offline rewards based on the LLM's outputs.

•

DSP uses small, tunable language models (such as those trained through reinforcement learning) to generate hints or clues that steer the LLM's responses toward the intended outcomes.

•

It offers more control compared to conventional fine-tuning methods, steering the model's responses while still maintaining its overall capabilities.

Practical use case:

In customer service, DSP can be used to analyze complaints and deliver fast, accurate solutions, boosting efficiency and supporting human agents.

For example, in customer support, if the prompt is “A customer complains about a delivery delay,” the DSP would provide hints like “apologize, reassure, offer a solution,” guiding the LLM to craft a kind and helpful reply, such as, “We apologize for the shipping delay—your order has been prioritized and we are ensuring it arrives soon.”

It may be used for commercial purposes with permission from the copyright holder, provided the source is cited.

Made with Slashpage