Chapter 8 · companion worksheet

90-day pilot scope template

Fill this in before you turn anything on. The kill criterion goes first — if you cannot write the specific, numeric condition under which you would stop the pilot, you are not ready to start. A pilot with no pre-agreed metric is not a pilot; it is a budget line someone will defend on vibes.

1. The kill criterion (fill this first)

A good kill criterion is specific, numeric, and time-bound, so nobody can argue their way around it later. Write it on the hard cases — the exceptions, the disputed interactions — not on the easy volume metric. A tripwire on the easy metric never trips.

Field	Your entry
If [metric] falls below [number] by [week/date], we stop and redesign.
The metric above is measured on: (describe the hard cases — exceptions, disputed transactions, edge-case inputs, not overall volume)
Who is authorized to call the stop?
What "stop and redesign" means in practice (what happens next if the criterion is hit)

2. The workflow

One workflow only. The single best candidate from your Assess output. Scoped tight. Not a platform, not "AI across the organization."

Field	Your entry
Workflow name and description (one sentence)
Where this workflow starts (trigger / input)
Where this workflow ends (output / handoff)
Volume in scope (transactions / cases / interactions per week)
Current owner of this workflow (name or role)

3. The pre-agreed metric and target

The metric must be agreed before the pilot starts, not after — because once it is running, everyone has a stake in calling it a success. The fight about what "good" means is cheap to have on a whiteboard and expensive to have in a boardroom. Volume metrics (percentage handled, throughput) lie; they are easy to hit and hide what matters. The metric that tells the truth is almost always about the hard cases.

Field	Your entry
Primary success metric (what "working" means for this workflow)
Target value (the number the pilot has to hit)
Secondary metric, if any (often: what the primary metric is not allowed to break)
Baseline today (current performance without AI, so you can measure change)
How and by whom the metric will be measured during the pilot
Who agreed to this metric (names of the people in the room)

4. Scope boundaries

Define what is inside and outside the pilot explicitly. Scope creep is how a 90-day pilot becomes an 18-month project with no exit.

In scope	Out of scope (explicitly excluded)

Exception cases (the hard cases this pilot must include on purpose)

List the ways this workflow goes wrong today when a human runs it. The pilot's measurement must cover these — because that is where the risk is, and a volume metric averages them into invisibility.

______
______
______

5. The human-in-the-loop (HITL) gate

Any output that goes to a client, affects a financial record, or triggers an irreversible action needs a named human review step. Define it before launch, not after the first mistake.

Field	Your entry
Which outputs require human review before they are acted on?
Who reviews those outputs? (name or role)
What does the reviewer do when they disagree with the AI output? (the escalation path)
How will errors caught by the reviewer be logged? (so the pattern of wrong answers is visible, not just individual mistakes)

6. Owner, sponsor, and visibility

Run the pilot where the people who do the work can see it — not in secret. Trust is a function of visible, correctable errors: watching the system make a mistake, watching it get fixed, and concluding it is safe to rely on. A pilot run in secret and presented as a triumph earns a polite nod and a parallel spreadsheet.

Field	Your entry
Pilot owner (day-to-day, accountable for the workflow)
Executive sponsor (accountable for the outcome; can authorize a stop)
Team members who will see the pilot run (not just the results)
How errors and corrections will be surfaced to the team during the pilot

7. Review cadence

Review point	Date	Who attends	Decision to be made
Week 2 check-in (early signal: is the metric moving, are the hard cases behaving as expected?)			Continue / adjust scope / stop
Week 6 formal review (kill-criterion check; evaluate the hard-case metric)			Continue / stop and redesign
Day 90 readout (full results against pre-agreed metric; decision on next step)			Accelerate / extend pilot / stop

Sign-off before launch

Do not start the pilot until each item below is checked.

☐ Kill criterion written, specific, numeric, time-bound
☐ Kill criterion is on the hard cases, not the easy volume metric
☐ Pre-agreed success metric documented and signed off by all stakeholders named above
☐ Baseline performance documented (so change can be measured)
☐ Exception cases listed and included in scope
☐ Scope boundaries defined — in and out
☐ HITL gate defined: which outputs, who reviews, what the escalation path is
☐ Pilot owner and executive sponsor named
☐ Team members identified who will watch the pilot run (not just receive the results)
☐ Review dates on the calendar

Want a second set of eyes on this in your firm? The no-sell promise applies — if it isn't a fit, I'll tell you in the first ten minutes.

Book a 30-Minute Call →