Claude Opus 4: The Most Capable AI Model I've Tested

Last month, Anthropic released Claude Opus 4, and I've spent the last four weeks testing it extensively. After 30 years in technology, I've learned to be skeptical of "most capable" claims. But this one is real.

This is my honest assessment: what makes Opus 4 special, where the trade-offs are, and whether it changes how your firm should approach AI.

What Changed from Sonnet to Opus 4

For context: Claude 4 Sonnet (released May 2025) is the fast, practical model. Opus 4 is the deep-thinking model. The difference shows immediately:

Reasoning Depth

Opus 4 handles multi-step analysis that would require multiple Claude 4 Sonnet calls. I tested it against GPT-4, and Opus 4 consistently produced more logically coherent outputs on complex scenarios:

Mergers and acquisition scenarios (company strategy, alignment analysis, integration planning)
Regulatory compliance analysis (parsing complex regulations, identifying gaps, suggesting controls)
Multi-stakeholder negotiation strategy (understanding incentives, identifying use points)

The difference is structural. Opus 4 reasons through problems differently than Sonnet. It doesn't just retrieve and organize—it actually works through logic.

Context Window and Consistency

Opus 4 handles 200,000-token context windows (about 150,000 words). More importantly, it stays consistent throughout. I loaded entire contracts, financial statements, and regulation sets, and Opus 4 maintained reasoning quality across all of it.

Sonnet starts to lose coherence on very long documents. Opus 4 doesn't.

Code and Technical Reasoning

If your firm uses AI for technical work (data analysis, automation, system design), Opus 4 is noticeably better. It understands edge cases and suggests better architecture.

The Trade-Off: Speed and Cost

Opus 4 is slower and more expensive. That matters.

Speed: Sonnet responds in 1–3 seconds for most tasks. Opus 4 takes 5–15 seconds. For real-time client work, this is significant.
Cost: Opus 4 API pricing is roughly 3x Sonnet for input and 2x for output. For a firm doing high-volume work, this adds up.

The economics matter. At scale, you're not replacing Sonnet with Opus 4. You're using each for what it's good at.

When to Use Opus 4

Based on four weeks of testing, I recommend Opus 4 for:

High-Stakes Strategic Work

Analysis that directly impacts client decisions. Mergers, reorganizations, major strategy shifts. The extra reasoning depth reduces errors and catches edge cases. Sonnet can do this, but Opus 4 is safer.

Cost: One Opus 4 call might cost $2–$5. If it prevents one bad recommendation, it's worth 100x.

Complex Compliance and Regulatory Analysis

Parsing regulations, identifying gaps in controls, building compliance frameworks. Opus 4 is exceptional at this because it actually reasons through logical constraints.

Code and System Design

If you're using AI for technical work, Opus 4 produces better code and suggests better architecture. Fewer bugs, cleaner design.

Long-Document Analysis

When you need to analyze full contracts, regulatory sets, or financial statements end-to-end, Opus 4's context window and consistency matter.

When NOT to Use Opus 4

Don't use it for:

Brainstorming and ideation. Sonnet is faster and good enough. Save Opus 4 for refining ideas, not generating them.
Routine research and summarization. Sonnet excels here. No reason to pay 3x.
Client-facing interactive work. The 5–15 second latency breaks conversation flow. Use Sonnet for real-time client engagement.
High-volume commodity analysis. If you're doing hundreds of similar analyses, use Sonnet and save the cost.

What This Means for Your AI Stack

If you're building an AI capability in 2025, this is my recommendation:

Default to Sonnet for most work. It's fast, capable, and cost-effective.
Route high-stakes and complex work to Opus 4. Build a decision framework: if the analysis impacts a client decision worth >$100K, use Opus 4.
Use GPT-4 as a complement for specific tasks where it excels (content generation, certain types of code).
Don't custom-build. Anthropic, OpenAI, and Google are moving so fast that custom fine-tuning barely makes sense. Use the foundation models as they are.

The Enterprise Implication

By June 2025, we're reaching a moment where enterprise AI comes down to judgment, not capability. The models are good enough for almost any professional services use case. The question is: where do you apply them and how?

Opus 4 represents that shift. It's not revolutionary. It's evolutionary. But evolution matters when you're building something at scale.

My Recommendation

If your firm is serious about AI, get access to Opus 4. Try it on your hardest problems. You'll find 2–3 workflows where it makes a real difference. Those become your use points.

By June 2025, that's how you compete: not by having AI, but by using the right AI for the right problem.

Want to discuss AI strategy for your firm?

Book a free 30-minute assessment — no pitch, just practical insights.

Book a Call