English
Share
Sign In
⬆️

Guided Stimulation: Techniques to Better Reach Your Goals

Directional Stimulus Prompting (DSP) is a method proposed by Zekun Li's team in the paper <Guiding Large Language Models via Directional Stimulus Prompting> published in 2023.
Directional prompts are a new framework for guiding large-scale language models (LLMs) toward a specific desired outcome. Instead of directly tuning the LLM, this method uses a small, tunable policy model (e.g., T5) to generate auxiliary directional prompts for each input instance. These directional prompts act as fine-grained hints and cues per instance to guide the LLM toward the desired outcome.
Real-world examples
Input text: Bob Barker returns to "The Price Is Right" after eight years away.
Note Summary: Bob Barker took over as host of "The Price Is Right" on Wednesdays. Barker retired from hosting in 2007.
💡
Directional Stimulus Prompt: Question: Based on the article above, summarize it briefly in 2-3 sentences, following the hints. Hint: Bob Barker; TV; April 1; “The Price Is Right”; 2007; 91.
Model Output: April 1 Bob Barker returns to the TV show "The Price Is Right" after an eight-year hiatus. At 91, Barker handles the first price guessing game with gusto. Barker stepped down from hosting in 2007, but still hasn't missed a beat.
Where do you use it?
This method uses supervised fine-tuning (SFT) to optimize the policy model using labeled data, and then uses reinforcement learning (RL) to further optimize it via online or offline rewards based on the output of the LLM.
DSP uses small, tunable language models (such as reinforcement learning models) to create hints or clues that guide the LLM's responses toward the desired outcome.
It provides greater control than traditional fine-tuning approaches, guiding the model's response while maintaining the model's general capabilities.
Real-world examples:
In the customer service space, DSPs can be leveraged to analyze customer complaints and provide quick and accurate solutions to improve efficiency and support human efforts.
In a customer service context, for a prompt like “Customer complains about a delayed shipment,” the DSP would generate hints like “Apologize, reassure, offer a solution” to guide the LLM to construct an empathetic and helpful response like “We apologize for the delay in shipment, your order has now been prioritized and we are taking steps to ensure it arrives soon.”
🤹
↔️
ⓒ 2023. Haebom, all rights reserved.
It may be used for commercial purposes with permission from the copyright holder, provided the source is cited.
👍