Instruction Tuning

A fine-tuning stage in which a pre-trained language model is trained on curated (instruction, response) pairs to teach it to follow natural-language directives rather than merely continue text.

Raw pre-trained models complete text; they do not follow instructions. Instruction tuning bridges this gap by training the model on a dataset of (instruction, desired response) pairs: "Summarize this article in 3 bullet points → [bullet points]", "Fix the bug in this Python function → [corrected code]". After instruction tuning, the model generalizes to follow novel instructions in zero-shot settings.

FLAN (Finetuned Language Net, Wei et al., 2022) demonstrated that instruction tuning at scale transfers across tasks: a model trained on thousands of diverse instruction types improves on held-out task types it has never seen. The diversity and quality of the instruction dataset matters more than its size beyond a certain threshold.

Modern instruction tuning typically uses synthetic data generated by stronger models (a technique sometimes called "self-play" or "constitutional synthetic data generation"). Anthropic, Meta, and Mistral all use synthetic instruction data alongside human-written examples to build the instruction-following capabilities of their production models.