Understanding LLM Settings for Effective Prompting

When working with LLMs (Large Language Models), adjusting certain settings can significantly change the responses. Here’s an analysis of these settings and how to use them effectively. At first, terms like temperature or sequence might sound confusing, but you can just think of them as technical jargon. When you use chatGPT, you only see the input box, so it’s hard to grasp how it works, but if you go to the Playground, you can see how models like GPT-3.5 are configured.

platform.openai.com/playground

Temperature: Striking a Balance between Determinism and Creativity

Description: Think of temperature as a dial that controls how predictable the model is. The lower the temperature, the more predictable and consistent the responses. The higher the temperature, the more creativity and variation you get.

How to use:
For factual questions, such as concrete queries, use a lower temperature value for concise and accurate answers.
For creative tasks like writing poetry, raising the temperature will encourage more imaginative responses.

Top_p (Nucleus Sampling): Tweaking Response Variability

Description: Top_p is used alongside temperature to manage the level of variability in responses. The lower the top_p, the more accurate but less diverse the responses. The higher the top_p, the greater the diversity of output.

Example:

•

To make answers more accurate, keep top_p low.

•

If you want to explore a wider range of ideas or styles, raise the top_p.

•

There is also a top_k setting, but generally it's set to 0 and left alone. (If you’re curious why, you could read the GPT-2 paper...but instead, there's a well-explained document from Naver Clova team. -Link)

Max Length: Adjusting the Response Length

Description: By adjusting the 'Max Length', you can limit the length of responses. This helps avoid overly long or off-topic answers. Setting a specific number of tokens keeps responses concise and cost-effective.

Stop Sequences: Defining the End Point

Description: A specific string that tells the model where to stop generating text.

Example: To limit a list to 10 items, add "11" as a stop sequence.

Frequency Penalty: Penalizing Frequency

Description: This setting prevents the model from repeating the same words or phrases.

Example: By increasing the frequency penalty, you can diversify your model’s language and reduce repetitiveness.

Presence Penalty: Penalizing Repeated Presence

How to use: This applies a penalty equally to all repeated tokens, no matter how often they appear, so you get more varied answers.

Adjusting settings:

•

Increase if you want more variety and creativity in your text.

•

Lower it if you want the content to be more focused.

Tip! In general, for clear results, it’s best to adjust only one among Temperature, top_p, Frequency, or Presence penalty, and leave the others as they are.

Variability Depending on Model Version
Remember that your results may vary based on the LLM version and model you're using. It's always a good idea to experiment and find the best settings for your specific needs.

You can use this for commercial purposes with the copyright holder's permission, as long as the source is credited.

Made with Slashpage