90-day pilot scope template
Fill this in before you turn anything on. The kill criterion goes first — if you cannot write the specific, numeric condition under which you would stop the pilot, you are not ready to start. A pilot with no pre-agreed metric is not a pilot; it is a budget line someone will defend on vibes.
1. The kill criterion (fill this first)
A good kill criterion is specific, numeric, and time-bound, so nobody can argue their way around it later. Write it on the hard cases — the exceptions, the disputed interactions — not on the easy volume metric. A tripwire on the easy metric never trips.
| Field | Your entry |
|---|---|
| If [metric] falls below [number] by [week/date], we stop and redesign. | |
| The metric above is measured on: (describe the hard cases — exceptions, disputed transactions, edge-case inputs, not overall volume) | |
| Who is authorized to call the stop? | |
| What "stop and redesign" means in practice (what happens next if the criterion is hit) |
2. The workflow
One workflow only. The single best candidate from your Assess output. Scoped tight. Not a platform, not "AI across the organization."
| Field | Your entry |
|---|---|
| Workflow name and description (one sentence) | |
| Where this workflow starts (trigger / input) | |
| Where this workflow ends (output / handoff) | |
| Volume in scope (transactions / cases / interactions per week) | |
| Current owner of this workflow (name or role) |
3. The pre-agreed metric and target
The metric must be agreed before the pilot starts, not after — because once it is running, everyone has a stake in calling it a success. The fight about what "good" means is cheap to have on a whiteboard and expensive to have in a boardroom. Volume metrics (percentage handled, throughput) lie; they are easy to hit and hide what matters. The metric that tells the truth is almost always about the hard cases.
| Field | Your entry |
|---|---|
| Primary success metric (what "working" means for this workflow) | |
| Target value (the number the pilot has to hit) | |
| Secondary metric, if any (often: what the primary metric is not allowed to break) | |
| Baseline today (current performance without AI, so you can measure change) | |
| How and by whom the metric will be measured during the pilot | |
| Who agreed to this metric (names of the people in the room) |
4. Scope boundaries
Define what is inside and outside the pilot explicitly. Scope creep is how a 90-day pilot becomes an 18-month project with no exit.
| In scope | Out of scope (explicitly excluded) |
|---|---|
Exception cases (the hard cases this pilot must include on purpose)
List the ways this workflow goes wrong today when a human runs it. The pilot's measurement must cover these — because that is where the risk is, and a volume metric averages them into invisibility.
- ______
- ______
- ______
5. The human-in-the-loop (HITL) gate
Any output that goes to a client, affects a financial record, or triggers an irreversible action needs a named human review step. Define it before launch, not after the first mistake.
| Field | Your entry |
|---|---|
| Which outputs require human review before they are acted on? | |
| Who reviews those outputs? (name or role) | |
| What does the reviewer do when they disagree with the AI output? (the escalation path) | |
| How will errors caught by the reviewer be logged? (so the pattern of wrong answers is visible, not just individual mistakes) |
6. Owner, sponsor, and visibility
Run the pilot where the people who do the work can see it — not in secret. Trust is a function of visible, correctable errors: watching the system make a mistake, watching it get fixed, and concluding it is safe to rely on. A pilot run in secret and presented as a triumph earns a polite nod and a parallel spreadsheet.
| Field | Your entry |
|---|---|
| Pilot owner (day-to-day, accountable for the workflow) | |
| Executive sponsor (accountable for the outcome; can authorize a stop) | |
| Team members who will see the pilot run (not just the results) | |
| How errors and corrections will be surfaced to the team during the pilot |
7. Review cadence
| Review point | Date | Who attends | Decision to be made |
|---|---|---|---|
| Week 2 check-in (early signal: is the metric moving, are the hard cases behaving as expected?) | Continue / adjust scope / stop | ||
| Week 6 formal review (kill-criterion check; evaluate the hard-case metric) | Continue / stop and redesign | ||
| Day 90 readout (full results against pre-agreed metric; decision on next step) | Accelerate / extend pilot / stop |
Sign-off before launch
Do not start the pilot until each item below is checked.
- ☐ Kill criterion written, specific, numeric, time-bound
- ☐ Kill criterion is on the hard cases, not the easy volume metric
- ☐ Pre-agreed success metric documented and signed off by all stakeholders named above
- ☐ Baseline performance documented (so change can be measured)
- ☐ Exception cases listed and included in scope
- ☐ Scope boundaries defined — in and out
- ☐ HITL gate defined: which outputs, who reviews, what the escalation path is
- ☐ Pilot owner and executive sponsor named
- ☐ Team members identified who will watch the pilot run (not just receive the results)
- ☐ Review dates on the calendar
Want a second set of eyes on this in your firm? The no-sell promise applies — if it isn't a fit, I'll tell you in the first ten minutes.
Book a 30-Minute Call →