Chapter 34 · companion worksheet

Multi-modal use-case map

Your firm has been producing inputs that AI can read for years. This worksheet helps you match your function's actual document and signal types to the modality best suited to reading them — so you can find the highest-value use case before you evaluate a single tool.

Step 1 — Identify what your team reads that it didn't create

List the documents, files, or signals your team spends the most time reading that originate outside your team (vendor invoices, client contracts, field photos, patient charts, transcripts, etc.).

______
______
______
______

Step 2 — Classify the reading: extraction vs. judgment

For each item above, estimate the split. Extraction = pulling the same fields or flags every time ("is this in order?", "what does this say about X?"). Judgment = deciding what to do with what you found, requiring experience or authority.

Input type (from Step 1)	% extraction	% judgment	AI candidate? (☐ yes / ☐ no)

The extraction portion is the multimodal candidate. The judgment portion stays with a human. Most teams find 60–80% of their reading is extraction when they measure honestly.

Step 3 — Cross-reference function against modality

Mark the cells where your function produces this input type at volume today. Then use the guidance column to identify the entry point.

Function / department	Text documents (contracts, reports, PDFs)	Voice / audio (calls, dictation, meetings)	Images / photos (site, product, scans)	Computer-use (legacy app, no API)
Legal / compliance	Contract review, deposition transcripts, discovery batches, intake documents	Client intake calls, hearing recordings	Scanned exhibits, court filings	Court portals, government filing systems
Finance / AP	Invoices, contracts, spend reports, audit documents	Approval calls, vendor discussions	Receipt images, check scans	Legacy ERP entry, vendor portals
Healthcare / clinical	Patient charts, lab results, referral letters	Physician-patient encounters, intake calls	Medical images, ECG signals, wound photos	EHR systems without API access
AEC / field ops	Specs, inspection reports, safety manuals	Field hazard descriptions, safety walkthroughs	Jobsite photos, progress photos, defect images	Project management tools, compliance portals
HR	CVs, policy documents, offer letters, leave requests	Candidate screens, onboarding sessions	ID verification scans	HRIS data entry, benefits portals
Marketing	Briefs, competitor content, performance reports	Customer interviews, recorded calls	Ad creative, brand assets	CMS tools, ad platforms without API
Your function: ______

Step 4 — Select the highest-value use case

From Steps 2 and 3, identify the one cell where extraction volume is highest and the input type your team is already producing at scale. This is your entry point.

Field	Your answer
Selected use case
Input type (text / voice / image / computer-use)
Volume: how many instances per week?
Current time per instance (human reading)
Output format the AI needs to produce
Where does the human pick back up for judgment?

How to read the results

The technology is ready when you are — frontier models accept text, images, PDFs, and audio as direct inputs without a separate extraction pipeline. The design question is which reading task to remove from the human's plate, what the output format needs to be, and where the human re-enters for the judgment call. Answer those three questions first. Then evaluate tools.

Want a second set of eyes on this in your firm? The no-sell promise applies — if it isn't a fit, I'll tell you in the first ten minutes.

Book a 30-Minute Call →