Output Control & Repetition

Control the shape of the answer with max tokens and stop sequences, and kill repetitive, looping text with frequency and presence penalties. The settings that make output production-ready.

Ad 728×90

Max tokens — cap the length

Why: max output tokens is a hard limit on how much the model can generate — it protects you from runaway cost and cut-off-but-too-long replies. When: set it deliberately for every production call. Where: a token is roughly 4 characters or 3/4 of a word, so 200 tokens is about 150 words.

Set Max output tokens to 60, then run:

"Summarise the causes of World War I."

The model is forced to be brief. Note: it does NOT make the model
"aim" for 60 tokens — it just stops there, so also ask for brevity
in the prompt: "in 2 sentences".

Stop sequences — end on cue

Why: a stop sequence is text that, when generated, makes the model halt immediately — perfect for cutting off trailing chatter after the part you want. When: use it to stop after one item, one JSON object, or before the model starts a new "Q:". How: set the stop string in the parameters panel.

Stop sequence: "\n\n"   (two newlines)

Prompt:
"List one benefit of cycling, then stop.
Benefit:"

The model writes one benefit and the double newline halts it before
it can ramble into a numbered list.

Frequency penalty

Why: frequency penalty lowers the chance of a word the more times it has already appeared, breaking up "the the the" loops and repetitive phrasing. When: nudge it up (0.3-0.8) for long generations that start echoing themselves. Where: 0 is off; too high makes the model avoid necessary words.

Attributes

frequency_penalty — Range roughly -2 to 2. Positive values discourage reusing words proportionally to how often they have already appeared. Use ~0.5 to reduce verbatim repetition in long text.

Set frequency_penalty to 0 and generate:
"Write a 6-line poem about the sea."

Now set it to 0.8 and regenerate. The second version reuses far
fewer words like "wave" and "blue".

Presence penalty

Why: presence penalty discourages any word that has appeared even once, pushing the model toward new topics rather than new phrasings. When: raise it to make brainstorms wander into fresh territory; keep it low when you want the model to stay on one subject. Where: do not crank both penalties high at once — output turns incoherent.

Attributes

presence_penalty — Range roughly -2 to 2. Positive values push the model to introduce new tokens/topics it has not used yet. Use a small value (~0.3) to keep a brainstorm from circling the same idea.

Set presence_penalty to 0.6 and run:
"Brainstorm 10 uses for a paperclip."

Compared to 0, the list ranges much wider because the model is
rewarded for moving on to new concepts.

Max tokens — cap the length

Set Max output tokens to 60, then run:

"Summarise the causes of World War I."

The model is forced to be brief. Note: it does NOT make the model
"aim" for 60 tokens — it just stops there, so also ask for brevity
in the prompt: "in 2 sentences".

Stop sequences — end on cue

Stop sequence: "\n\n"   (two newlines)

Prompt:
"List one benefit of cycling, then stop.
Benefit:"

The model writes one benefit and the double newline halts it before
it can ramble into a numbered list.

Frequency penalty

Attributes

frequency_penalty — Range roughly -2 to 2. Positive values discourage reusing words proportionally to how often they have already appeared. Use ~0.5 to reduce verbatim repetition in long text.

Set frequency_penalty to 0 and generate:
"Write a 6-line poem about the sea."

Now set it to 0.8 and regenerate. The second version reuses far
fewer words like "wave" and "blue".

Presence penalty

Attributes

Set presence_penalty to 0.6 and run:
"Brainstorm 10 uses for a paperclip."

Compared to 0, the list ranges much wider because the model is
rewarded for moving on to new concepts.