Three viable models at the frontier. Two weeks ago, that would have been a problem. Now it's an opportunity—if you know how to choose.
This is the question I'm getting from every client right now: should we build on Claude 3, stick with GPT-4, or bet on Google's Gemini? The answer isn't "pick the best one." The answer is "pick the right one for your actual work." And that requires knowing what you're actually optimizing for.
The Framework
Stop thinking about "best model." Start thinking about task-model fit. Four dimensions matter:
- Quality on your specific task: Not benchmark scores. Can it do the work your people actually do?
- Speed: How fast do you need the answer? Real-time client interaction, or batch processing overnight?
- Cost: Per-token pricing matters when you're processing thousands of documents a week.
- Operational risk: API availability, rate limits, how much it costs if something goes wrong.
Every model wins on some of these. None of them wins on all of them.
Claude 3 Opus: When You Need the Absolute Best
Use Opus for: Complex reasoning, legal document analysis, multi-page research synthesis, anything where accuracy matters more than speed.
The math: Opus costs more per token than GPT-4, and it's slower. You don't use it for everything. You use it for the 20% of tasks where getting the right answer the first time is worth the extra cost.
Where it wins: Contract review, regulatory compliance analysis, investigative work. If a human would have to check the work anyway, Opus reduces the checking. That saves time and catches errors early.
The risk: Anthropic is smaller than OpenAI or Google. If you're building a production system, you need to know that. The API is stable, but capacity constraints are real when demand spikes. Plan for fallback to another model.
Claude 3 Sonnet: Your Daily Driver
Use Sonnet for: 80% of your work. Document extraction, routine analysis, content generation, classification tasks.
The math: Sonnet is fast, cheap, and genuinely good. It's the model I recommend to most firms for core workflow automation because the cost-to-quality ratio is unbeatable.
Where it wins: Proposal generation, client intake processing, research summaries, email drafting. Anything repeatable and high-volume.
The gap to Opus: Opus handles edge cases better. For your core workflows, Sonnet is sufficient. For exceptions, escalate to Opus or a human.
GPT-4: Still the Safest Bet
Use GPT-4 for: Anything where you already have GPT integration, where your team is comfortable with OpenAI, or where you need broader ecosystem support.
The math: GPT-4 is slightly more expensive than Sonnet but cheaper than Opus. It's faster than Opus, slower than Sonnet. It sits in the middle on all dimensions.
Where it wins: Image analysis (if that matters to you), broader plugin ecosystem, ChatGPT integration if you're already in that world, organizational momentum if your team already knows OpenAI's tools.
The gap: On complex reasoning tasks, Claude 3 is ahead. On speed and cost, Sonnet wins. GPT-4 doesn't have a clear best-in-class strength anymore, but it's not a bad choice for anything.
Gemini: The Underdog
Use Gemini for: Integration with Google Workspace (if you live in Gmail and Docs), image/video analysis, or as a backup/fallback option.
The math: Gemini's pricing and performance are competitive, but the API is less mature than OpenAI or Claude. Google's API documentation is catching up, but if you're building production systems, you'll hit rough edges.
The gap: Gemini is newer in its current form (the most recent versions are from late 2023/early 2024). The community around it is smaller. Integration support is thinner. You can build on it, but you'll have fewer examples and libraries to lean on.
Where it wins: Google Workplace integration, multimodal analysis that includes video, potentially lower cost if Google's pricing strategy shifts. If you're all-in on Google infrastructure, it makes sense.
Your Decision Tree
If you're starting from zero: Pick Claude 3 Sonnet for your core workflows. It's genuinely excellent, cost-effective, and performant. Build with API abstraction so you can swap models if needed. For 20% of tasks where Sonnet struggles (complex legal reasoning, edge cases), fall back to Opus or have a human check.
If you're already on GPT-4: Run a pilot comparing GPT-4 to Claude 3 Sonnet on your highest-volume workflows. Time and cost it. If Sonnet is materially better (and it usually is), migrate. If GPT-4 works fine, don't optimize prematurely. But know that you're potentially leaving money on the table.
If you're building a multi-model system: Good. That's the right architecture. Use Sonnet for the high-volume, latency-sensitive path. Use Opus for complex analysis. Use GPT-4 as fallback if Claude is capacity-constrained. Add Gemini for anything that benefits from Google integrations.
If you need to build in Gemini: It works. The quality is there. Just budget extra time for documentation hunting and expect the ecosystem to be thinner. Make sure your architecture can swap providers if Google's API support doesn't keep up.
The Honest Bit
All three of these models are good enough for professional services. None of them is going to make a bad workflow magically excellent. The difference between them is optimization. You get better results faster and cheaper by matching the model to the task. But you don't fail because you picked the wrong model.
You fail because you picked a model and then didn't actually integrate it into your work, or because you expected it to solve problems that are really about process and people.
Model choice matters. But it matters less than you think.
Want to discuss AI strategy for your firm?
Book a free 30-minute assessment — no pitch, just practical insights.
Book a Call