AI Glossary

Plain explanations of how this actually works.

54 concepts, methods, and questions — covering agents, RAG, models, strategy, and process. No mystification. Search or browse by category.

54 entries

AI Agent

Agents

An AI system that doesn't just answer a question — it takes a sequence of actions to complete a goal. An agent can search the web, query a database, write a file, call an API, check its own output, and iterate — all without a human in the loop for each step. This is the dominant AI paradigm in 2025–2026.

Multi-agent system

Agents

Multiple AI agents working in parallel or sequence — one orchestrating the others. An orchestrator agent breaks a task into subtasks and assigns them to specialist agents. Results are synthesised and returned. Used when a task is too large, too complex, or benefits from independent verification.

Orchestrator

Agents

The top-level agent in a multi-agent system. It receives the goal, plans the approach, dispatches work to subagents, monitors progress, and synthesises the final result. The orchestrator doesn't do the detailed work — it routes, sequences, and verifies.

Tool use

Agents

The ability of an AI model to call external functions — web search, database queries, API calls, file operations. The model decides when to use a tool, what arguments to pass, and how to interpret the result. Tool use is what turns a language model into an agent.

MCP — Model Context Protocol

Agents

An open standard (developed by Anthropic, now widely adopted) that defines how AI models connect to external tools and data sources. Think of it as a universal plug: one protocol, any data source — databases, APIs, filesystems, live feeds. MCP servers expose tools that any compatible AI can call.

Agentic workflow

Agents

A workflow where AI takes actions autonomously across multiple steps — not just answering a prompt. The key properties: the AI can plan ahead, use tools, evaluate intermediate results, and revise its approach before returning a final output.

Human-in-the-loop

Agents

A design pattern where a human reviews or approves at specific points in an agentic workflow — typically before irreversible actions (sending an email, writing to a database, publishing content). Well-designed systems escalate to humans only for genuine exceptions, not every step.

Computer use

Agents

AI that can operate a computer interface directly — move a cursor, click buttons, type in forms, navigate browsers. Useful for automating workflows in legacy software that has no API. Emerging capability in 2025–2026.

Autonomous agent

Agents

An agent that runs a complete task from start to finish without human checkpoints. Distinguished from a 'supervised' agent by the degree of independence. Requires robust guardrails and careful design — autonomy without validation is fragile in production.

RAG — Retrieval-Augmented Generation

Data & RAG

A technique for grounding AI outputs in real data. Instead of relying on what the model learned during training, RAG retrieves the specific documents, rows, or records relevant to the current query and feeds them into the prompt. The model reasons over your data — not its memory of the world.

Vector embedding

Data & RAG

A numeric representation of text (or images, audio) that captures meaning, not just words. Similar concepts end up close together in vector space. Used to find relevant content by semantic similarity — 'what does this mean?' rather than 'what words does this contain?'

Vector database

Data & RAG

A database optimised for storing and searching vector embeddings. Given a query, it returns the most semantically similar entries — across millions of documents in milliseconds. Supabase (pgvector), Pinecone, Weaviate, and Qdrant are common choices.

Semantic search

Data & RAG

Search by meaning rather than keywords. A semantic search for 'revenue decline' finds documents about 'falling sales', 'shrinking margins', and 'EBITDA compression' — even if those exact words aren't in the query. Enabled by vector embeddings.

Hybrid search

Data & RAG

Combining semantic (vector) search with traditional keyword (BM25) search and merging the results. Outperforms either method alone — semantic search catches meaning; keyword search catches exact terms. Best practice for production RAG systems in 2026.

Chunking

Data & RAG

Splitting documents into smaller pieces before embedding them. Chunk size is one of the most impactful parameters in a RAG system — too large and the retrieved context is noisy; too small and it loses context. Sentence-level or paragraph-level chunking with overlap is common.

Reranking

Data & RAG

A second pass after retrieval that reorders results by relevance. The initial vector search retrieves the top-N candidates quickly; the reranker (a slower, more accurate model) rescores them. Significantly improves the quality of what ends up in the prompt.

Knowledge graph

Data & RAG

A structured representation of entities and their relationships — people, organisations, concepts, events, and the links between them. More navigable than flat documents for complex reasoning tasks. AI agents can traverse a knowledge graph to build context incrementally.

Persistent memory

Data & RAG

Information an AI agent retains across sessions — past decisions, established facts, user preferences, accumulated domain knowledge. Without memory, every session starts from zero. Persistent memory is what allows agents to become genuinely more useful over time.

RAG vs fine-tuning

Data & RAG

The two main ways to give AI access to your specific knowledge. Fine-tuning bakes knowledge into model weights — expensive, static, and hard to update. RAG retrieves knowledge at query time — cheap, dynamic, and updatable. For most business applications, RAG is the right choice. Fine-tuning is better for style, tone, and format than for facts.

Grounding

Data & RAG

Anchoring AI outputs to verifiable source data. A grounded system cites its sources and can be audited. Ungrounded systems invent plausible-sounding facts. In finance, healthcare, and legal work, grounding is non-negotiable.

Context window

Data & RAG

The maximum amount of text an AI model can consider at once — its working memory. Modern models have 128K–1M token context windows. Larger windows allow more documents, more history, and more complex instructions to be included in a single call.

Large Language Model (LLM)

Models

A deep learning model trained on massive text corpora to predict and generate language. The core technology behind ChatGPT, Claude, Gemini, and others. LLMs encode statistical patterns about language, facts, and reasoning from their training data.

Claude (Anthropic)

Models

Anthropic's family of AI models — the primary models used in my systems. Claude 4 models (Sonnet, Opus, Haiku) lead on reasoning, long-context, and instruction-following as of mid-2026. Claude has a 1M token context window on certain tiers — critical for processing large documents.

Reasoning model

Models

A model that 'thinks before answering' — generating an internal chain of reasoning (sometimes hidden, sometimes visible) before producing output. Better at complex multi-step problems, maths, and logic than standard generation models. Claude's extended thinking mode is an example.

Multimodal model

Models

An AI model that processes multiple data types — text, images, audio, video — not just text. GPT-4o and Claude are multimodal. Enables use cases like analysing charts from PDFs, transcribing meeting recordings, or extracting data from scanned documents.

Model selection

Models

Choosing the right model for each task in a system. Not every call needs the most powerful (and expensive) model. A common pattern: use a fast, cheap model (Claude Haiku) for classification and routing; use a powerful model (Claude Sonnet or Opus) for reasoning and synthesis.

Tokens

Models

The unit of text AI models process — roughly 0.75 words per token. Costs, speed, and context limits are all measured in tokens. A 1,000-word document is approximately 1,300 tokens. A 100-page report is approximately 80,000–100,000 tokens.

Temperature

Models

A parameter controlling how 'random' or 'creative' model outputs are. Temperature 0 = deterministic, always the same answer. Temperature 1 = more varied and creative. For factual, structured outputs (financial analysis, reports), use low temperature. For creative tasks, higher.

Hallucination

Models

When an AI generates confident-sounding but incorrect information — inventing facts, citations, or figures that don't exist. Not a bug so much as an inherent property of language models. Mitigated by grounding outputs in retrieved data and validating results with a second pass.

Structured output

Models

Constraining AI output to a defined format — JSON, CSV, a specific schema. Critical for production systems where downstream code needs to parse the result. Modern models can be instructed to output valid JSON reliably; dedicated features like JSON mode enforce this at the API level.

Prompt engineering

Methods

Designing the instructions given to an AI model to elicit the best output. In 2026, the most important techniques are: clear role definition, few-shot examples, chain-of-thought instruction, explicit output format specification, and separation of system context from user input.

System prompt

Methods

The persistent instruction given to an AI at the start of a session — defining its role, constraints, output format, and domain knowledge. The system prompt is where most of the engineering work in a production AI system lives.

Chain of thought

Methods

Instructing an AI to reason step-by-step before giving a final answer. 'Think through this carefully before responding' dramatically improves accuracy on complex tasks. The model's intermediate reasoning can be made visible or kept internal.

Few-shot prompting

Methods

Including examples of desired input/output pairs in the prompt. Instead of describing what you want, you show it. Two to five good examples typically outperform detailed written instructions for complex formatting or classification tasks.

Evals (evaluation)

Methods

Systematic testing of an AI system's output quality. A set of inputs with known-good outputs, run against the system to measure accuracy, format compliance, and regression. Without evals, you can't measure improvement or catch regressions when you change the prompt or model.

Guardrails

Methods

Validation layers that check AI outputs before they reach users or downstream systems. In financial and medical systems, a second agent verifies numbers against source data. Content guardrails check for hallucinations, out-of-scope responses, or policy violations.

Function calling

Methods

A model capability where the AI outputs a structured function call — the name of a function and its arguments — instead of prose. The calling application executes the function and returns the result. The model then continues reasoning with the real result. Foundation of tool use.

Streaming

Methods

Delivering model output token-by-token as it's generated, rather than waiting for the full response. Makes AI interfaces feel fast and responsive. Essential for voice interfaces and long-form generation where waiting 30 seconds for a response would kill the experience.

Agentic loop

Methods

The core execution pattern of an agent: receive task → plan → act → observe result → decide next action → repeat until done. Each iteration the agent updates its understanding based on tool results and intermediate outputs.

Context compression

Methods

Techniques to fit more relevant information into a limited context window — summarising older conversation history, filtering low-relevance retrieved chunks, or compressing documents before including them. Important for long-running agentic tasks.

When should I use AI?

Strategy

When you have a process that: (1) follows a repeatable logic, (2) is expensive in time or people, and (3) the cost of an occasional mistake is manageable. The best candidates are expert knowledge work — analysis, drafting, classification, synthesis — not physical or truly novel creative tasks.

Build vs buy

Strategy

Off-the-shelf AI tools (Copilot, ChatGPT Enterprise, etc.) work for generic tasks. Custom-built systems are necessary when: the workflow is domain-specific, requires proprietary data, must integrate with existing systems, or the competitive advantage lies in the AI capability itself.

What can't AI do well (in 2026)?

Strategy

Genuinely novel creative work that requires lived experience. Physical tasks. Real-time situational judgment in ambiguous, high-stakes environments without defined success criteria. Relationship-building where authenticity is what creates value. The list is shrinking, but these boundaries remain.

AI-first workflow design

Strategy

Redesigning a process around AI's strengths rather than automating the existing process step-by-step. Often the right approach is not 'make each step faster' but 'redesign the whole thing — what would this look like if AI did the analytical work from the start?'

The structural advantage window

Strategy

Organisations that deploy production AI systems in 2025–2026 accumulate 12–18 months of operational learning — prompts, evals, edge cases, data pipelines — that competitors who wait cannot acquire quickly. The value of moving first in AI is not speed to deploy; it's the compound learning effect.

Data moat

Strategy

Proprietary data that makes your AI system more accurate or useful than a competitor's. The best AI systems aren't just using better models — they're using better data. Historical transactions, proprietary benchmarks, clinical outcomes, and domain-specific knowledge are durable advantages.

AI ROI

Strategy

Measured in three ways: hours saved (time value of the workflow), quality improvement (fewer errors, faster decisions), and capability unlocked (things you can now do that weren't viable before). The third is often the most valuable and hardest to quantify in advance.

Pilot vs production

Strategy

A pilot answers 'can this work?' A production system answers 'does this run reliably at scale?' Most organisations get stuck at pilot. The gap is: real data integration, error handling, monitoring, user adoption, and iterative improvement based on actual usage.

How long does it take to build?

Process

For a well-scoped system: first working version in 2 weeks, production pilot in 6–8 weeks. The variables are data access (how hard it is to connect to the right sources), scope clarity (is the workflow well-defined?), and iteration speed (how fast feedback loops are).

What do I need to provide?

Process

Access to the workflow (ideally working alongside someone who does it today), access to the data sources it uses, and willingness to give feedback on early outputs. You don't need a technical team. You need domain experts who can evaluate whether the AI output is correct.

Who owns the code?

Process

You do. Every system I build is handed over with full source code, documentation, and deployment configuration. No lock-in, no ongoing licence, no dependency on me to keep it running. You should be able to hand it to any developer and have them understand it.

How do you ensure accuracy?

Process

Three layers: (1) grounding — AI outputs are anchored to retrieved source data, not model memory; (2) guardrails — a second validation pass checks key facts and numbers against sources; (3) evals — a test suite of known-good examples runs against every version before deployment.

Data privacy

Process

Proprietary data stays in your infrastructure. API calls to AI models send only the specific context needed for each query — not bulk uploads of all your data. For highly sensitive domains (healthcare, legal), on-premise or private cloud deployment with dedicated model endpoints is available.

How does this differ from a consulting engagement?

Process

A consulting engagement produces a recommendation. This produces a working system. The deliverable is code in production, not a slide deck. The engagement ends when the system is running and your team can operate it — not when a report is delivered.