AI Glossary

Authoritative definitions for AI agents, MCP, Claude tool-use, and the broader AI engineering ecosystem. Each entry cross-references Wikipedia and Wikidata where the term has an established encyclopedia entry.

Agent harness
agent runtime
An agent harness is a software framework that runs an LLM in a loop with tool access, persistent context, and a stopping criterion — turning a one-shot model call into a multi-step workflow that can plan, execute, observe results, and re-plan until a task is complete.
Context window
inference
The context window is the maximum number of tokens a large language model can process in a single forward pass, including the prompt, all in-context examples, retrieved documents, and the model's generated output — measured in thousands or millions of tokens.
Embedding
AI architecture
An embedding is a numerical vector representation of an input — typically text, but also images, audio, or code — that places semantically similar inputs near each other in a high-dimensional space, enabling semantic search, retrieval, classification, and clustering.
Fine-tuning
training
Fine-tuning is the process of further training a pre-trained large language model on a smaller, task-specific dataset to specialize its behavior — adjusting either all model weights (full fine-tuning) or a small adapter layer (parameter-efficient fine-tuning, e.g., LoRA, QLoRA).
Function calling
agent runtime
Function calling is OpenAI's name for the LLM tool-use capability, introduced June 2023, where the model emits structured JSON describing a function to call and its arguments, conforming to a JSON Schema the developer supplies.
Large Language Model (LLM)
AI architecture
A Large Language Model (LLM) is a deep neural network — typically a transformer with billions to trillions of parameters — trained on large text corpora to predict the next token, then fine-tuned for instruction-following, dialogue, and increasingly tool use and reasoning.
Model Context Protocol (MCP)
protocol
Model Context Protocol (MCP) is an open standard introduced by Anthropic in November 2024 for connecting AI assistants to data sources and tools through a JSON-RPC wire protocol over stdio or HTTP transports.
Reinforcement Learning from Human Feedback (RLHF)
training
Reinforcement Learning from Human Feedback (RLHF) is a training technique in which a language model's policy is fine-tuned using a reward model that has been trained on human preference rankings, aligning model output with human-judged quality on dimensions like helpfulness and harmlessness.
Retrieval-Augmented Generation (RAG)
AI architecture
Retrieval-Augmented Generation (RAG) is a technique introduced by Meta AI in 2020 for grounding large language model outputs in retrieved external documents, combining a retriever (typically a vector index) with a generator (a language model) so the model's response is conditioned on relevant source material rather than parametric memory alone.
Tokenization
AI architecture
Tokenization is the process of splitting text into discrete units (tokens) that a language model treats as its atomic input — typically subword fragments such that common words are one token and rare words are several, balancing vocabulary size against representation efficiency.
Tool use
agent runtime
Tool use is a capability where a large language model is given access to external functions (tools) it can invoke during inference, with the model deciding when to call which tool, generating structured arguments for the call, and incorporating the result into its subsequent generation.
Transformer
AI architecture
The Transformer is a neural network architecture introduced by Vaswani et al. in 2017 that uses self-attention to process sequences in parallel, replacing the recurrence of RNNs and LSTMs and becoming the foundational architecture for nearly every modern large language model.