Multi-modal use-case map
Your firm has been producing inputs that AI can read for years. This worksheet helps you match your function's actual document and signal types to the modality best suited to reading them — so you can find the highest-value use case before you evaluate a single tool.
Step 1 — Identify what your team reads that it didn't create
List the documents, files, or signals your team spends the most time reading that originate outside your team (vendor invoices, client contracts, field photos, patient charts, transcripts, etc.).
- ______
- ______
- ______
- ______
Step 2 — Classify the reading: extraction vs. judgment
For each item above, estimate the split. Extraction = pulling the same fields or flags every time ("is this in order?", "what does this say about X?"). Judgment = deciding what to do with what you found, requiring experience or authority.
| Input type (from Step 1) | % extraction | % judgment | AI candidate? (☐ yes / ☐ no) |
|---|---|---|---|
The extraction portion is the multimodal candidate. The judgment portion stays with a human. Most teams find 60–80% of their reading is extraction when they measure honestly.
Step 3 — Cross-reference function against modality
Mark the cells where your function produces this input type at volume today. Then use the guidance column to identify the entry point.
| Function / department | Text documents (contracts, reports, PDFs) | Voice / audio (calls, dictation, meetings) | Images / photos (site, product, scans) | Computer-use (legacy app, no API) |
|---|---|---|---|---|
| Legal / compliance | Contract review, deposition transcripts, discovery batches, intake documents | Client intake calls, hearing recordings | Scanned exhibits, court filings | Court portals, government filing systems |
| Finance / AP | Invoices, contracts, spend reports, audit documents | Approval calls, vendor discussions | Receipt images, check scans | Legacy ERP entry, vendor portals |
| Healthcare / clinical | Patient charts, lab results, referral letters | Physician-patient encounters, intake calls | Medical images, ECG signals, wound photos | EHR systems without API access |
| AEC / field ops | Specs, inspection reports, safety manuals | Field hazard descriptions, safety walkthroughs | Jobsite photos, progress photos, defect images | Project management tools, compliance portals |
| HR | CVs, policy documents, offer letters, leave requests | Candidate screens, onboarding sessions | ID verification scans | HRIS data entry, benefits portals |
| Marketing | Briefs, competitor content, performance reports | Customer interviews, recorded calls | Ad creative, brand assets | CMS tools, ad platforms without API |
| Your function: ______ |
Step 4 — Select the highest-value use case
From Steps 2 and 3, identify the one cell where extraction volume is highest and the input type your team is already producing at scale. This is your entry point.
| Field | Your answer |
|---|---|
| Selected use case | |
| Input type (text / voice / image / computer-use) | |
| Volume: how many instances per week? | |
| Current time per instance (human reading) | |
| Output format the AI needs to produce | |
| Where does the human pick back up for judgment? |
How to read the results
The technology is ready when you are — frontier models accept text, images, PDFs, and audio as direct inputs without a separate extraction pipeline. The design question is which reading task to remove from the human's plate, what the output format needs to be, and where the human re-enters for the judgment call. Answer those three questions first. Then evaluate tools.
Want a second set of eyes on this in your firm? The no-sell promise applies — if it isn't a fit, I'll tell you in the first ten minutes.
Book a 30-Minute Call →