English
Share
Sign In
📝

Understanding LLM Settings for Effective Prompting

When working with LLM (Large Language Model), adjusting certain settings can make a big difference in the response. Here is an analysis of these settings and how to use them effectively. At first, it may seem difficult to understand why the temperature is displayed and what the sequence is, but just think of them as terms. If you use chatGPT, there is only an input window, so it is difficult to understand how it works, but if you go to the Playground, you can understand how GPT-3.5, etc. are set.
1.
Temperature: Balancing Determinism and Creativity
Description: Think of Temperature as a dial that controls how predictable your model is. The lower the Temperature, the more predictable and consistent the response. The higher the Temperature, the more creativity and variability you allow .
Practical use:
For factual questions, such as specific questions, use
lower Temperature values for concise and accurate answers .
For creative tasks like poetry , increasing the Temperature value can
lead to more imaginative answers .
2.
Top_p (Nucleus Sampling): Fine-tuning response variability
Description: Top_p is used to manage the variability of the response with temperature. Lower top_p leads to higher accuracy of the response, but lower diversity of the response. Higher top_p leads to more diverse output .
Example :
Keep top_p low to increase accuracy of answers.
Increase top_p to explore different ideas and styles.
There is also top_k, but it is usually set to 0 and left untouched. (If you are curious about why, read the GPT-2 paper... Don't read it, but there is a document from the Naver Clova team that explains it well. - Link )
3.
Max Length: Control the response length
Description: Set a limit on the length of your response by adjusting the 'Max Length'. This will help prevent overly long or off-topic responses. Keep your responses concise and cost-effective by setting a specific number of tokens.
4.
Stop Sequences: Defining Endpoints
Description: A specific string that tells the model when to stop generating text.
Example: To limit the list to 10 items, add "11" as a stop sequence.
5.
Frequency Penalty: Frequency Penalty
Description: This setting prevents the model from repeating the same word or phrase.
Example: Increasing the frequency penalty can add diversity to the language of your model and reduce redundancy.
6.
Presence Penalty: Presence Penalty
How to use: Penalizes all repeating tokens equally, regardless of how often they appear, allowing for a variety of answers.
Adjust settings:
Increase if you want more varied and creative text.
Lower it if you want more focused content.
🕵️
NOTE! In general, to get clear results, it is recommended to adjust only one of the Temperature or top_p, Frequency, Presence penalties and leave the others alone.
💡
Variability across model versions
Please keep in mind that results may vary depending on the LLM version and model you are using. It is always recommended to experiment to find the settings that best suit your specific needs.
🧠
🔈
ⓒ 2023. Haebom, all rights reserved.
It may be used for commercial purposes with permission from the copyright holder, provided the source is cited.