A practical guide to designing and deploying agents — for professionals who want to get this right, not just get started.
Before you build anything, you need to be honest about what kind of work you are actually trying to automate. Most failed agent projects fail here — not in the code.
An agent is the right tool when the work has three properties: it is multi-step (not a single lookup or generation), it is repeatable (the same type of task recurs often enough to justify design), and it involves judgment (not just rule-following — there are edge cases, ambiguous inputs, decisions to make). If the work is a one-off, a simple formula, or entirely rule-based, you probably don't need an agent.
A chatbot answers a question and resets. An agent takes a job and works it through. The distinction matters because they require entirely different design thinking. The chatbot is optimised for the quality of a single response. The agent is optimised for the reliability of an outcome across many steps, over time, with real-world consequences.
| Dimension | Chatbot | AI Agent |
|---|---|---|
| Memory | Resets each session | Persists across sessions and tasks |
| Trigger | User sends a message | Event, schedule, or condition |
| Actions | Generates text | Calls tools, updates systems, sends messages |
| Scope | One exchange | End-to-end workflow |
| Output | A response | A result with audit trail |
| Failure mode | Bad answer | Wrong action at scale |
Run your candidate use case through this checklist before committing to build:
If you can't write down the steps a human follows, the agent won't know them either. Clarity of process is a prerequisite for automation.
Agents pay back design investment through repetition. A task that runs daily is worth careful design. A task that runs once a quarter probably isn't.
Agents make mistakes. If a mistake in this workflow is recoverable and low-stakes, proceed. If it's irreversible or high-consequence, build human checkpoints in before you automate fully.
List every data source and destination. For each one, ask: does it have an API? An MCP connector? If neither, browser or computer use may bridge the gap — but that adds complexity.
Agents drift. Systems change, data changes, model behaviour changes. Someone needs to be accountable for monitoring and updating the agent over time. If there's no owner, there's no agent.
The best agent use cases are not the most impressive ones. They are the ones where the task is clear, the repetition is high, and the cost of a mistake is manageable.
Every agent, no matter how simple or complex, runs on the same six-layer stack. Understanding what each layer does — and who is responsible for it — is the foundation of good agent design.
The AI model — Claude, GPT-4o, Gemini. It receives a system prompt and runtime context, then decides what to do next. This layer is pre-built. You configure it; you do not build it.
System PromptContext WindowRAGKeeps the model running across tasks. Routes inputs, calls tools, manages errors, schedules runs, and spins up sub-agents. CoWork, Managed Agents, LangGraph.
CoWorkSub-AgentsSchedulingVersioned instruction bundles that encode how to handle specific task types. Memory layers persist knowledge between sessions. This is where your design work lives.
SKILL.mdLong-term MemoryVector StoreThe wires that plug the agent into the external world. MCP is the universal standard. Skills tell the agent what to do — connectors give it somewhere to do it.
MCPREST APIWebhooksTool calls, code execution, file writes, messages — and critically, browser use (any website) and computer use (any desktop application, even without an API).
Browser UseComputer UseHuman HandoffEval criteria, stop conditions, audit logs, drift detection. Usually built last. Always wished for sooner. This is the layer that makes delegation safe.
EvalsAudit LogDrift WatchLayers 1 and 2 are mostly pre-built when you use a managed platform like CoWork. Layers 3–6 are where your design work happens. The engineering is often less important than the thinking you put into skills, connectors, and governance.
There are six things a builder actually controls when deploying a managed Claude agent. Get these right and the agent works. Get them wrong and even a great model will produce unreliable results.
This is where the agent's judgement lives — its identity, its rules, its scope, its tone. The system prompt is the most leveraged design decision you will make. Be explicit about what the agent should do, what it should never do, and how it should handle the edge cases you already know about. Vague prompts produce vague agents.
Every tool you give an agent is a surface area for error. Give it only the connectors and actions it actually needs for the task. Define the sequence where order matters. An agent with ten tools and no sequencing guidance will use them in unexpected ways.
What information is available to the agent at each step — and what is deliberately excluded? Context bloat is real: stuffing everything into the context window does not make the agent smarter, it makes it noisier. Be intentional about what the agent sees when.
Define clearly when the agent should stop and ask a human rather than proceed. These conditions are not a sign of weakness in the design — they are what makes the agent trustworthy. Common triggers: low confidence, irreversible actions, data outside expected ranges, first-time encounters with a new entity.
What does the agent need to remember, and for how long? Short-term memory covers the current task. Long-term memory covers preferences, prior decisions, and accumulated knowledge. Episode stores let the agent learn from past runs. Design this deliberately — unmanaged memory accumulates noise.
How do you know the agent is doing the right thing? Define success metrics before you deploy, not after something goes wrong. Good evals cover accuracy (did it produce the right output?), coverage (did it handle the full range of inputs?), and behaviour (did it stay within its defined boundaries?).
The model's reasoning quality is excellent out of the box. What separates good agents from great ones is the quality of the system prompt, the skill design, and the governance layer. That is craft work, not engineering.
A skill is a reusable, versioned instruction bundle that tells your agent how to handle a specific type of task — precisely, consistently, in your context. Skills are what make an agent feel genuinely expert rather than generically capable.
Think of a skill like a job description combined with a standard operating procedure. It tells the agent not just what to do, but how to do it — the rules to follow, the judgement calls to make, the edge cases to watch for, and when to stop and ask. A well-written skill is the difference between an agent that does the job and one that does your job.
The main instruction file. Defines what the skill does, the step-by-step procedure, the decision logic for common cases, and the output format. Written in plain language — no code required.
Configuration parameters for the skill — thresholds, default values, system-specific settings. Separating config from procedure means you can update one without touching the other.
A record of how past edge cases were resolved. This is where organisational knowledge accumulates — the agent's institutional memory for the unusual situations that come up in real work.
Define exactly what the skill expects to receive and exactly what it should return. Ambiguity at the boundaries is where most skill failures start.
Every process has exceptions. The skill is the right place to document how those exceptions should be handled — not left to the model's general judgement.
Skills drift when they are not versioned. When you change a skill, know what you changed and why. If behaviour degrades, you need to be able to roll back.
Run the skill against a representative set of real inputs before deploying it. Include inputs you expect it to handle well and inputs that are on the edge of its scope.
One of the most important risks in agent deployment is the gap between what a skill says and what the agent actually does in production. This is the agent equivalent of documented processes that drift from how work is actually done. Monitor it. Update the skill to reflect reality, or update the agent behaviour to match the skill — but don't let the gap grow.
Connectors are how the agent reaches into the real world — reading and writing the systems your organisation actually uses. Getting this layer right determines the practical reach of your agent.
The Model Context Protocol (MCP), launched by Anthropic in late 2024 and now adopted industry-wide, is the universal standard for connecting AI agents to external tools and data. Before MCP, every integration was a bespoke engineering project. Now it is infrastructure — running an MCP server is as routine as running a web server. If a system has an MCP connector available, connecting your agent to it is a configuration task, not a development task.
Pre-built, standardised connections to common tools and platforms. Gmail, Slack, Salesforce, Jira, Google Drive, Figma, and hundreds more. The fastest path to connecting any system with an MCP server available.
Plug-and-playStandardisedDirect integration with any system that exposes an API. More setup than MCP but available for almost any modern system. Webhooks let external systems push events to your agent rather than the agent polling.
Most systemsFull controlThe agent navigates any website and interacts with it exactly as a person would — filling forms, clicking buttons, reading content — without needing an API. The solution for systems with a UI but no integration layer.
No API neededAny websiteThe agent controls a full desktop environment, including applications that have no web interface and no API. The solution for legacy systems that cannot be reached any other way.
Legacy systemsAny desktop appEach connector is a surface area for unintended action. Scope the agent's access to the minimum required for the task. An agent with write access to ten systems can cause ten times the damage of an agent scoped to one.
Connectors fail — APIs go down, credentials expire, rate limits hit. Define what the agent should do when a connector is unavailable: retry, escalate to a human, or degrade gracefully.
External systems change their APIs and data structures. When they do, connectors that worked yesterday break silently today. Build monitoring for connector health into your governance layer.
You do not need to build from scratch. A growing set of platforms and tools provides the harness, connectors, and deployment infrastructure. Here is an honest map of the main options.
The most complete spectrum for building agents without code. CoWork for no-code composition via skills and connectors. Managed Agents for enterprise API-level control. Claude Code for developers. Constitutional AI training makes Claude particularly well-suited for contexts requiring nuanced, trustworthy judgement.
No-Code (CoWork)Managed APIMCP EcosystemSkills ArchitectureCodex is an autonomous software engineering agent. Workspace Agents extend into knowledge work. Strong developer ecosystem and broadest model availability. Common first choice for teams already on GPT-4o.
Code-firstWorkspace AgentsFunction CallingAgents native to Google Workspace — Gmail, Drive, Meet. Gemini 1.5 Pro excels at long-context understanding. A2A protocol handles agent-to-agent communication. Best fit for organisations deeply in Google Workspace.
Google WorkspaceA2A ProtocolLong ContextThe dominant enterprise choice for Microsoft 365 shops. No-code builder with Power Automate, SharePoint knowledge, and Teams deployment. Strong data governance. Best for large IT-led deployments.
Microsoft 365Power AutomateEnterprise GovernanceMaximum control, model-agnostic, deployable anywhere. LangGraph for stateful workflows. CrewAI for multi-agent teams with defined roles. AutoGen for conversational multi-agent systems. Significant engineering investment required.
LangGraphCrewAIModel-agnosticFull ControlThe most mature agentic coding tool. Takes a GitHub issue, produces a full plan, implementation, and pull request. Agent Mode handles multi-file refactors in VS Code. Purpose-built for engineering workflows.
Code AgentsPR AutomationVS CodeStart with your existing vendor relationships and existing tool stack. An organisation already in Microsoft 365 should evaluate Copilot Studio seriously before building custom. An organisation that wants maximum design control without developer dependency should look at Claude CoWork. For regulated industries or bespoke enterprise deployments, Managed Claude Agents or OSS frameworks on your own infrastructure provide the governance and data sovereignty controls you need.
The governance layer is what separates a production agent from a prototype. It is usually built last and wished for first. Build it early.
Four distinct types of drift will affect any agent running in production over time. Each requires a different response.
External systems change their APIs, data structures, and authentication. A connector that worked last month breaks silently today. Monitor connector health continuously, not just at deployment.
The world the agent operates in changes — new document formats, new email patterns, new edge cases that weren't in the original design. The agent's actual inputs diverge from what its skills were written for.
The gap between what the skill says and what the agent actually does in production. Analogous to the gap between documented processes and how work actually gets done. Monitor it. Close it.
Model updates can subtly change agent behaviour even when nothing in your design has changed. An agent that ran reliably on one model version may behave differently after a model update. Run evals after updates.
Every production agent should have at minimum:
Written before deployment. Covers accuracy, coverage, and boundary behaviour. Run regularly, not just once.
The specific circumstances under which the agent halts and escalates to a human. Documented in the skill, enforced by the harness.
A complete record of every decision the agent made, every tool it called, and every output it produced. Essential for debugging, compliance, and accountability.
One person accountable for the agent's behaviour, its ongoing performance, and its response to drift. Without an owner, the agent degrades silently.
An agent without governance is not an autonomous system. It is a liability waiting to mature.
A 3.5-hour hands-on virtual session where you explore the magic, possibilities and limitations of Claude Agents. No coding experience needed.
Book your place — £39