How does temperature impact next token prediction in LLMs? | by Ankur Manikandan | May, 2024

Published:


Introduction
Large Language Models (LLMs) are versatile generative models suited for a wide array of tasks. They can produce consistent, repeatable outputs or generate creative content by placing unlikely words together. The “temperature” setting allows users to fine-tune the model’s output, controlling the degree of predictability.

Let’s take a hypothetical example to understand the impact of temperature on the next token prediction.

We asked an LLM to complete the sentence, “This is a wonderful _____.” Let’s assume the potential candidate tokens are:

|   token    | logit |
|------------|-------|
| day | 40 |
| space | 4 |
| furniture | 2 |
| experience | 35 |
| problem | 25 |
| challenge | 15 |

The logits are passed through a softmax function so that the sum of the values is equal to one. Essentially, the softmax function generates probability estimates for each token.

Standard softmax function

Let’s calculate the probability estimates in Python.

import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from ipywidgets import interactive, FloatSlider

def softmax(logits):
exps = np.exp(logits)
return exps / np.sum(exps)

data = {
"tokens": ["day", "space", "furniture", "experience", "problem", "challenge"],
"logits": [5, 2.2, 2.0, 4.5, 3.0, 2.7]
}
df = pd.DataFrame(data)
df['probabilities'] = softmax(df['logits'].values)
df

| No. |   tokens   | logits | probabilities |
|-----|------------|--------|---------------|
| 0 | day | 5.0 | 0.512106 |
| 1 | space | 2.2 | 0.031141 |
| 2 | furniture | 2.0 | 0.025496 |
| 3 | experience | 4.5 | 0.310608 |
| 4 | problem | 3.0 | 0.069306 |
| 5 | challenge | 2.7 | 0.051343 |
ax = sns.barplot(x="tokens", y="probabilities", data=df)
ax.set_title('Softmax Probability Estimates')
ax.set_ylabel('Probability')
ax.set_xlabel('Tokens')
plt.xticks(rotation=45)
for bar in ax.patches:
ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height(), f'{bar.get_height():.2f}',
ha='center', va='bottom', fontsize=10, rotation=0)
plt.show()

The softmax function with temperature is defined as follows:

where (T) is the temperature, (x_i) is the (i)-th component of the input vector (logits), and (n) is the number of components in the vector.

def softmax_with_temperature(logits, temperature):
if temperature <= 0:
temperature = 1e-10 # Prevent division by zero or negative temperatures
scaled_logits = logits / temperature
exps = np.exp(scaled_logits - np.max(scaled_logits)) # Numerical stability improvement
return exps / np.sum(exps)

def plot_interactive_softmax(temperature):
probabilities = softmax_with_temperature(df['logits'], temperature)
plt.figure(figsize=(10, 5))
bars = plt.bar(df['tokens'], probabilities, color='blue')
plt.ylim(0, 1)
plt.title(f'Softmax Probabilities at Temperature = {temperature:.2f}')
plt.ylabel('Probability')
plt.xlabel('Tokens')
# Add text annotations
for bar, probability in zip(bars, probabilities):
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2, yval, f"{probability:.2f}", ha='center', va='bottom', fontsize=10)
plt.show()

interactive_plot = interactive(plot_interactive_softmax, temperature=FloatSlider(value=1, min=0, max=2, step=0.01, description='Temperature'))
interactive_plot

At T = 1,

At a temperature of 1, the probability values are the same as those derived from the standard softmax function.

At T > 1,

Raising the temperature inflates the probabilities of the less likely tokens, thereby broadening the range of potential candidates (or diversity) for the model’s next token prediction.

At T < 1,

Lowering the temperature, on the other hand, makes the probability of the most likely token approach 1.0, boosting the model’s confidence. Decreasing the temperature effectively eliminates the uncertainty within the model.

Conclusion

LLMs leverage the temperature parameter to offer flexibility in their predictions. The model behaves predictably at a temperature of 1, closely following the original softmax distribution. Increasing the temperature introduces greater diversity, amplifying less likely tokens. Conversely, decreasing the temperature makes the predictions more focused, increasing the model’s confidence in the most probable token by reducing uncertainty. This adaptability allows users to tailor LLM outputs to a wide array of tasks, striking a balance between creative exploration and deterministic output.

Related Updates

Recent Updates