Better Models: Worse Tools

What happened

Simon Willison reports on a puzzling regression in newer Anthropic models: Claude Opus 4.8 and Sonnet 5 are more likely than older models to add spurious fields to tool call arguments when used with Pi's custom edit harness. Armin, the Pi developer, observed that while the intended edit is usually correct, the extra keys cause the tool call to be rejected by Pi's schema validator. He theorizes this arises from reinforcement learning that fine-tunes the models to perform well with Claude's own built-in edit tool (search-and-replace), inadvertently making them overconfident in applying similar but non-matching schemas. This echoes OpenAI's approach with Codex and its apply_patch tool. For developers building AI-powered coding workflows, this highlights a growing tension: as frontier models are increasingly specialized for their native tools, third-party harnesses must either adapt their tool definitions to match the model's training distribution or risk degraded reliability. The practical takeaway is that model upgrades can introduce subtle compatibility regressions, and teams should systematically test tool-call faithfulness when updating models.

Key takeaways

Newer Claude models (Opus 4.8, Sonnet 5) sometimes add invented fields to tool call arguments when using Pi's custom edit tool, leading to rejections.

Older Claude models (Haiku, older Sonnet) did not exhibit this behavior, indicating a regression specific to recent training.

Armin hypothesizes that Anthropic's reinforcement learning for Claude's own edit tool causes the model to incorrectly extend schema to other tools.

This issue underscores a broader challenge: model optimization for native tools can harm third-party tool compatibility.

Builders should test model updates against their tool schemas to catch such regressions early.

What happened

Key takeaways

Why it matters

More AI news

Search AI Workflow Pro

Better Models: Worse Tools

What happened

Key takeaways

Why it matters

Related tools

More AI news