← Back to the bonus vault
Multi-modal use-case map diagram

Chapter 34 · companion worksheet

Multi-modal use-case map

Your firm has been producing inputs that AI can read for years. This worksheet helps you match your function's actual document and signal types to the modality best suited to reading them — so you can find the highest-value use case before you evaluate a single tool.

Step 1 — Identify what your team reads that it didn't create

List the documents, files, or signals your team spends the most time reading that originate outside your team (vendor invoices, client contracts, field photos, patient charts, transcripts, etc.).

  1. ______
  2. ______
  3. ______
  4. ______

Step 2 — Classify the reading: extraction vs. judgment

For each item above, estimate the split. Extraction = pulling the same fields or flags every time ("is this in order?", "what does this say about X?"). Judgment = deciding what to do with what you found, requiring experience or authority.

Input type (from Step 1) % extraction % judgment AI candidate? (☐ yes / ☐ no)

The extraction portion is the multimodal candidate. The judgment portion stays with a human. Most teams find 60–80% of their reading is extraction when they measure honestly.

Step 3 — Cross-reference function against modality

Mark the cells where your function produces this input type at volume today. Then use the guidance column to identify the entry point.

Function / department Text documents (contracts, reports, PDFs) Voice / audio (calls, dictation, meetings) Images / photos (site, product, scans) Computer-use (legacy app, no API)
Legal / compliance Contract review, deposition transcripts, discovery batches, intake documents Client intake calls, hearing recordings Scanned exhibits, court filings Court portals, government filing systems
Finance / AP Invoices, contracts, spend reports, audit documents Approval calls, vendor discussions Receipt images, check scans Legacy ERP entry, vendor portals
Healthcare / clinical Patient charts, lab results, referral letters Physician-patient encounters, intake calls Medical images, ECG signals, wound photos EHR systems without API access
AEC / field ops Specs, inspection reports, safety manuals Field hazard descriptions, safety walkthroughs Jobsite photos, progress photos, defect images Project management tools, compliance portals
HR CVs, policy documents, offer letters, leave requests Candidate screens, onboarding sessions ID verification scans HRIS data entry, benefits portals
Marketing Briefs, competitor content, performance reports Customer interviews, recorded calls Ad creative, brand assets CMS tools, ad platforms without API
Your function: ______

Step 4 — Select the highest-value use case

From Steps 2 and 3, identify the one cell where extraction volume is highest and the input type your team is already producing at scale. This is your entry point.

Field Your answer
Selected use case
Input type (text / voice / image / computer-use)
Volume: how many instances per week?
Current time per instance (human reading)
Output format the AI needs to produce
Where does the human pick back up for judgment?

How to read the results

The technology is ready when you are — frontier models accept text, images, PDFs, and audio as direct inputs without a separate extraction pipeline. The design question is which reading task to remove from the human's plate, what the output format needs to be, and where the human re-enters for the judgment call. Answer those three questions first. Then evaluate tools.

Want a second set of eyes on this in your firm? The no-sell promise applies — if it isn't a fit, I'll tell you in the first ten minutes.

Book a 30-Minute Call →