Completion

The text generated by a language model in response to a prompt, representing the model's continuation of the input sequence according to its learned distribution over tokens.

A completion is generated autoregressively: the model samples one token at a time, appending each to the sequence and using the extended sequence to predict the next token. This continues until a stop condition is met: a maximum token count, a stop sequence, or an end-of-sequence token. The entire collection of generated tokens is the completion.

In modern chat APIs, "completion" has been largely replaced by "response" or "message" terminology, but the underlying mechanism is identical. The completions API (as opposed to the chat completions API) treats the conversation history as a single text block rather than structured messages—useful for certain fine-tuning workflows but generally less convenient for conversational applications.

Output quality depends jointly on the prompt quality, the model's capabilities, and the sampling parameters. Greedy decoding (always picking the highest-probability token) is deterministic but can produce flat, repetitive text. Temperature sampling introduces variety at the cost of occasional incoherence. Best-of-N sampling generates multiple completions and selects the best, trading compute for quality.