AI Agents Playbook — AI Explorers

Chapter 01

Framing the Problem

Before you build anything, you need to be honest about what kind of work you are actually trying to automate. Most failed agent projects fail here — not in the code.

An agent is the right tool when the work has three properties: it is multi-step (not a single lookup or generation), it is repeatable (the same type of task recurs often enough to justify design), and it involves judgment (not just rule-following — there are edge cases, ambiguous inputs, decisions to make). If the work is a one-off, a simple formula, or entirely rule-based, you probably don't need an agent.

Chatbot vs Agent — Know the Difference

A chatbot answers a question and resets. An agent takes a job and works it through. The distinction matters because they require entirely different design thinking. The chatbot is optimised for the quality of a single response. The agent is optimised for the reliability of an outcome across many steps, over time, with real-world consequences.

Dimension	Chatbot	AI Agent
Memory	Resets each session	Persists across sessions and tasks
Trigger	User sends a message	Event, schedule, or condition
Actions	Generates text	Calls tools, updates systems, sends messages
Scope	One exchange	End-to-end workflow
Output	A response	A result with audit trail
Failure mode	Bad answer	Wrong action at scale

Is Your Use Case Ready?

Run your candidate use case through this checklist before committing to build:

1

Can you describe the task in a standard operating procedure?

If you can't write down the steps a human follows, the agent won't know them either. Clarity of process is a prerequisite for automation.

2

How often does it run?

Agents pay back design investment through repetition. A task that runs daily is worth careful design. A task that runs once a quarter probably isn't.

3

What happens when it goes wrong?

Agents make mistakes. If a mistake in this workflow is recoverable and low-stakes, proceed. If it's irreversible or high-consequence, build human checkpoints in before you automate fully.

4

What systems does it need to touch?

List every data source and destination. For each one, ask: does it have an API? An MCP connector? If neither, browser or computer use may bridge the gap — but that adds complexity.

5

Who owns it after it's built?

Agents drift. Systems change, data changes, model behaviour changes. Someone needs to be accountable for monitoring and updating the agent over time. If there's no owner, there's no agent.

The best agent use cases are not the most impressive ones. They are the ones where the task is clear, the repetition is high, and the cost of a mistake is manageable.

Chapter 02

The Architecture

Every agent, no matter how simple or complex, runs on the same six-layer stack. Understanding what each layer does — and who is responsible for it — is the foundation of good agent design.

Layer 01 · Foundation

The Reasoning Engine

The AI model — Claude, GPT-4o, Gemini. It receives a system prompt and runtime context, then decides what to do next. This layer is pre-built. You configure it; you do not build it.

System PromptContext WindowRAG

Layer 02 · Harness

The Orchestration Platform

Keeps the model running across tasks. Routes inputs, calls tools, manages errors, schedules runs, and spins up sub-agents. CoWork, Managed Agents, LangGraph.

CoWorkSub-AgentsScheduling

Layer 03 · Skills

Skills & Memory

Versioned instruction bundles that encode how to handle specific task types. Memory layers persist knowledge between sessions. This is where your design work lives.

SKILL.mdLong-term MemoryVector Store

Layer 04 · Connectors

Connectors & MCP

The wires that plug the agent into the external world. MCP is the universal standard. Skills tell the agent what to do — connectors give it somewhere to do it.

MCPREST APIWebhooks

Layer 05 · Actions

What the Agent Can Do

Tool calls, code execution, file writes, messages — and critically, browser use (any website) and computer use (any desktop application, even without an API).

Browser UseComputer UseHuman Handoff

Layer 06 · Governance

Oversight & Evals

Eval criteria, stop conditions, audit logs, drift detection. Usually built last. Always wished for sooner. This is the layer that makes delegation safe.

EvalsAudit LogDrift Watch

Key Insight

Layers 1 and 2 are mostly pre-built when you use a managed platform like CoWork. Layers 3–6 are where your design work happens. The engineering is often less important than the thinking you put into skills, connectors, and governance.

Chapter 03

Designing Your Agent

There are six things a builder actually controls when deploying a managed Claude agent. Get these right and the agent works. Get them wrong and even a great model will produce unreliable results.

1

System Prompt & Context Design

This is where the agent's judgement lives — its identity, its rules, its scope, its tone. The system prompt is the most leveraged design decision you will make. Be explicit about what the agent should do, what it should never do, and how it should handle the edge cases you already know about. Vague prompts produce vague agents.

2

Tool Selection & Sequencing

Every tool you give an agent is a surface area for error. Give it only the connectors and actions it actually needs for the task. Define the sequence where order matters. An agent with ten tools and no sequencing guidance will use them in unexpected ways.

3

Runtime Context Design

What information is available to the agent at each step — and what is deliberately excluded? Context bloat is real: stuffing everything into the context window does not make the agent smarter, it makes it noisier. Be intentional about what the agent sees when.

4

Escalation & Stop Conditions

Define clearly when the agent should stop and ask a human rather than proceed. These conditions are not a sign of weakness in the design — they are what makes the agent trustworthy. Common triggers: low confidence, irreversible actions, data outside expected ranges, first-time encounters with a new entity.

5

Memory & State Design

What does the agent need to remember, and for how long? Short-term memory covers the current task. Long-term memory covers preferences, prior decisions, and accumulated knowledge. Episode stores let the agent learn from past runs. Design this deliberately — unmanaged memory accumulates noise.

6

Eval Criteria

How do you know the agent is doing the right thing? Define success metrics before you deploy, not after something goes wrong. Good evals cover accuracy (did it produce the right output?), coverage (did it handle the full range of inputs?), and behaviour (did it stay within its defined boundaries?).

The model's reasoning quality is excellent out of the box. What separates good agents from great ones is the quality of the system prompt, the skill design, and the governance layer. That is craft work, not engineering.

Chapter 04

Writing Skills

A skill is a reusable, versioned instruction bundle that tells your agent how to handle a specific type of task — precisely, consistently, in your context. Skills are what make an agent feel genuinely expert rather than generically capable.

Think of a skill like a job description combined with a standard operating procedure. It tells the agent not just what to do, but how to do it — the rules to follow, the judgement calls to make, the edge cases to watch for, and when to stop and ask. A well-written skill is the difference between an agent that does the job and one that does your job.

The Three Files of a Skill

File 01

SKILL.md

The main instruction file. Defines what the skill does, the step-by-step procedure, the decision logic for common cases, and the output format. Written in plain language — no code required.

File 02

CONFIG.md

Configuration parameters for the skill — thresholds, default values, system-specific settings. Separating config from procedure means you can update one without touching the other.

File 03

Resolution Priors

A record of how past edge cases were resolved. This is where organisational knowledge accumulates — the agent's institutional memory for the unusual situations that come up in real work.

What Makes a Skill Work

✓

Be specific about inputs and outputs

Define exactly what the skill expects to receive and exactly what it should return. Ambiguity at the boundaries is where most skill failures start.

✓

Encode your edge cases

Every process has exceptions. The skill is the right place to document how those exceptions should be handled — not left to the model's general judgement.

✓

Version it

Skills drift when they are not versioned. When you change a skill, know what you changed and why. If behaviour degrades, you need to be able to roll back.

✓

Test it before you trust it

Run the skill against a representative set of real inputs before deploying it. Include inputs you expect it to handle well and inputs that are on the edge of its scope.

On Skill Drift

One of the most important risks in agent deployment is the gap between what a skill says and what the agent actually does in production. This is the agent equivalent of documented processes that drift from how work is actually done. Monitor it. Update the skill to reflect reality, or update the agent behaviour to match the skill — but don't let the gap grow.

Chapter 05

Connectors & MCP

Connectors are how the agent reaches into the real world — reading and writing the systems your organisation actually uses. Getting this layer right determines the practical reach of your agent.

The Model Context Protocol (MCP), launched by Anthropic in late 2024 and now adopted industry-wide, is the universal standard for connecting AI agents to external tools and data. Before MCP, every integration was a bespoke engineering project. Now it is infrastructure — running an MCP server is as routine as running a web server. If a system has an MCP connector available, connecting your agent to it is a configuration task, not a development task.

Types of Connection

Type 01

MCP Connectors

Pre-built, standardised connections to common tools and platforms. Gmail, Slack, Salesforce, Jira, Google Drive, Figma, and hundreds more. The fastest path to connecting any system with an MCP server available.

Plug-and-playStandardised

Type 02

REST APIs & Webhooks

Direct integration with any system that exposes an API. More setup than MCP but available for almost any modern system. Webhooks let external systems push events to your agent rather than the agent polling.

Most systemsFull control

Type 03

Browser Use

The agent navigates any website and interacts with it exactly as a person would — filling forms, clicking buttons, reading content — without needing an API. The solution for systems with a UI but no integration layer.

No API neededAny website

Type 04

Computer Use

The agent controls a full desktop environment, including applications that have no web interface and no API. The solution for legacy systems that cannot be reached any other way.

Legacy systemsAny desktop app

Connector Design Principles

1

Give the agent only the connectors it needs

Each connector is a surface area for unintended action. Scope the agent's access to the minimum required for the task. An agent with write access to ten systems can cause ten times the damage of an agent scoped to one.

2

Design for failure

Connectors fail — APIs go down, credentials expire, rate limits hit. Define what the agent should do when a connector is unavailable: retry, escalate to a human, or degrade gracefully.

3

Watch for schema drift

External systems change their APIs and data structures. When they do, connectors that worked yesterday break silently today. Build monitoring for connector health into your governance layer.

Chapter 06

The Agent Landscape

You do not need to build from scratch. A growing set of platforms and tools provides the harness, connectors, and deployment infrastructure. Here is an honest map of the main options.

Claude Agents

Anthropic · CoWork · Managed Agents · Claude Code

The most complete spectrum for building agents without code. CoWork for no-code composition via skills and connectors. Managed Agents for enterprise API-level control. Claude Code for developers. Constitutional AI training makes Claude particularly well-suited for contexts requiring nuanced, trustworthy judgement.

No-Code (CoWork)Managed APIMCP EcosystemSkills Architecture

OpenAI Codex & Agents

OpenAI · Codex · Operator · Workspace Agents

Codex is an autonomous software engineering agent. Workspace Agents extend into knowledge work. Strong developer ecosystem and broadest model availability. Common first choice for teams already on GPT-4o.

Code-firstWorkspace AgentsFunction Calling

Google Agentspace

Google · Gemini · Vertex AI · A2A Protocol

Agents native to Google Workspace — Gmail, Drive, Meet. Gemini 1.5 Pro excels at long-context understanding. A2A protocol handles agent-to-agent communication. Best fit for organisations deeply in Google Workspace.

Google WorkspaceA2A ProtocolLong Context

Copilot Studio

Microsoft · Power Automate · Azure AI Foundry

The dominant enterprise choice for Microsoft 365 shops. No-code builder with Power Automate, SharePoint knowledge, and Teams deployment. Strong data governance. Best for large IT-led deployments.

Microsoft 365Power AutomateEnterprise Governance

OSS Frameworks

LangGraph · CrewAI · AutoGen · Haystack

Maximum control, model-agnostic, deployable anywhere. LangGraph for stateful workflows. CrewAI for multi-agent teams with defined roles. AutoGen for conversational multi-agent systems. Significant engineering investment required.

LangGraphCrewAIModel-agnosticFull Control

GitHub Copilot

GitHub (Microsoft) · Copilot Workspace · Agent Mode

The most mature agentic coding tool. Takes a GitHub issue, produces a full plan, implementation, and pull request. Agent Mode handles multi-file refactors in VS Code. Purpose-built for engineering workflows.

Code AgentsPR AutomationVS Code

How to Choose

Start with your existing vendor relationships and existing tool stack. An organisation already in Microsoft 365 should evaluate Copilot Studio seriously before building custom. An organisation that wants maximum design control without developer dependency should look at Claude CoWork. For regulated industries or bespoke enterprise deployments, Managed Claude Agents or OSS frameworks on your own infrastructure provide the governance and data sovereignty controls you need.

Chapter 07

Governance & Drift

The governance layer is what separates a production agent from a prototype. It is usually built last and wished for first. Build it early.

Four distinct types of drift will affect any agent running in production over time. Each requires a different response.

Drift Type 01

Connector & Schema Drift

External systems change their APIs, data structures, and authentication. A connector that worked last month breaks silently today. Monitor connector health continuously, not just at deployment.

Drift Type 02

Environmental Drift

The world the agent operates in changes — new document formats, new email patterns, new edge cases that weren't in the original design. The agent's actual inputs diverge from what its skills were written for.

Drift Type 03

Skill Drift

The gap between what the skill says and what the agent actually does in production. Analogous to the gap between documented processes and how work actually gets done. Monitor it. Close it.

Drift Type 04

Model Drift

Model updates can subtly change agent behaviour even when nothing in your design has changed. An agent that ran reliably on one model version may behave differently after a model update. Run evals after updates.

The Governance Minimum

Every production agent should have at minimum:

✓

Defined eval criteria

Written before deployment. Covers accuracy, coverage, and boundary behaviour. Run regularly, not just once.

✓

Explicit stop conditions

The specific circumstances under which the agent halts and escalates to a human. Documented in the skill, enforced by the harness.

✓

Audit logging

A complete record of every decision the agent made, every tool it called, and every output it produced. Essential for debugging, compliance, and accountability.

✓

A named owner

One person accountable for the agent's behaviour, its ongoing performance, and its response to drift. Without an owner, the agent degrades silently.

An agent without governance is not an autonomous system. It is a liability waiting to mature.

Ready to build your first agent?
Join the bootcamp.

A 3.5-hour hands-on virtual session where you explore the magic, possibilities and limitations of Claude Agents. No coding experience needed.

Book your place — £39

The AI Agents Playbook

Framing the Problem

Chatbot vs Agent — Know the Difference

Is Your Use Case Ready?

Can you describe the task in a standard operating procedure?

How often does it run?

What happens when it goes wrong?

What systems does it need to touch?

Who owns it after it's built?

The Architecture

The Reasoning Engine

The Orchestration Platform

Skills & Memory

Connectors & MCP

What the Agent Can Do

Oversight & Evals

Designing Your Agent

System Prompt & Context Design

Tool Selection & Sequencing

Runtime Context Design

Escalation & Stop Conditions

Memory & State Design

Eval Criteria

Writing Skills

The Three Files of a Skill

SKILL.md

CONFIG.md

Resolution Priors

What Makes a Skill Work

Be specific about inputs and outputs

Encode your edge cases

Version it

Test it before you trust it

Connectors & MCP

Types of Connection

MCP Connectors

REST APIs & Webhooks

Browser Use

Computer Use

Connector Design Principles

Give the agent only the connectors it needs

Design for failure

Watch for schema drift

The Agent Landscape

Governance & Drift

Connector & Schema Drift

Environmental Drift

Skill Drift

Model Drift

The Governance Minimum

Defined eval criteria

Explicit stop conditions

Audit logging

A named owner

Ready to build your first agent?Join the bootcamp.

Ready to build your first agent?
Join the bootcamp.