Anthropic released Claude 2 this week, and I've spent the last few days running the same prompts through it that I've been testing on other models. The improvements are real, and more importantly, the design philosophy is different in ways that matter for professional services firms.
What Changed
Claude 2 doubles down on two capabilities that professional services firms need: significantly longer context windows and stronger safety guardrails.
The context window—how much text the model can read in a single prompt—has expanded to 100,000 tokens. For context, that's roughly 75,000 words. You can feed Claude 2 an entire legal contract, a 200-page audit workpaper, or a complex client relationship summary in one go. The model can analyze it and respond with nuance, not hallucination.
I tested this with a 98-page tax opinion document. Claude 2 correctly identified three embedded contradictions that would have created compliance issues. GPT-4 struggled with the same document—it lost the thread around page 60.
But here's what impressed me more: Anthropic's explicit focus on constitutional AI and safety. They've trained Claude 2 to refuse certain requests, acknowledge uncertainty, and flag potential issues with its own responses. This is the opposite of the "tell me what I want to hear" approach that sells subscriptions.
Why This Matters for Professional Services
Professional services are built on liability. You can't afford a model that confidently generates plausible-sounding but incorrect legal analysis. You can't deploy something that might hallucinate a regulation that doesn't exist.
Anthropic's approach—training Claude to say "I don't know" and "this might be incorrect" rather than generating confident wrong answers—is fundamentally more useful for our industry.
I've been in consulting long enough to know: a client would rather hear "I need to verify that" than have a junior consultant confidently give them the wrong answer. Liability flows from confidence without accuracy.
The Practical Questions
That said, Claude 2 has some limitations worth knowing. The API is still ramping up. Pricing is higher than GPT-4 for the same input tokens. And the safety guardrails, while thoughtful, sometimes get in the way of legitimate work.
I had Claude 2 refuse to analyze a contract because it "might be used to avoid labor law compliance." The context was a law firm reviewing employment agreements for a potential acquisition. That's a legitimate use case. The refusal was overly cautious.
But that's a feature, not a bug. Anthropic is clearly optimizing for "no false positives" on safety rather than "maximum utility." In a world where AI is generating headlines about job displacement and misinformation, that's a refreshing stance.
What I'm Recommending Now
If you've been waiting to move from ChatGPT to a more capable model for document-heavy work, Claude 2 is worth a pilot. The 100K context window is genuinely transformative for firms that work with long, complex documents.
The pricing is higher, so this isn't a replacement for ChatGPT for every use case. But for your highest-value work—due diligence, complex contract analysis, regulatory research—the improvement in capability justifies the cost.
And if you're risk-averse, Anthropic's approach to safety is a competitive advantage. You can deploy Claude 2 knowing that its refusals are thoughtful, not arbitrary.
The age of one-size-fits-all AI models is ending. Different models have different strengths. Claude 2 is the right choice for professional services work that requires accuracy, nuance, and accountability.
Want to discuss AI strategy for your firm?
Book a free 30-minute assessment — no pitch, just practical insights.
Book a Call