Is New Claude Sonnet 3.5 the Best Model for Coding?

Posted on October 26, 2024, by Zhu Liang

On October 22nd, Anthropic released an upgraded Claude with groundbreaking new capabilities through Claude 3.5 Sonnet (claude-3-5-sonnet-20241022). As one of the most powerful LLMs available today, this new model brings significant improvements to code generation and complex tasks.

The new Sonnet has created excitement among software developers, particularly for its enhanced abilities in handling real-world software engineering tasks.

Performance Benchmarks Show Promise

The latest model has achieved new industry benchmarks according to Aider's code editing benchmark. Compared to previous models, the upgraded Claude shows better performance in handling complex instructions and generating functional code.

Aider's code editing benchmark

These impressive results demonstrate the rapid progress in AI models for the future of software development. Multiple development platforms have integrated the new Sonnet as their primary choice, setting a new standard for AI-assisted coding.

Rapid Industry Adoption

The quick adoption of the new Claude 3.5 Sonnet by prominent AI coding tools shows its potential. Aider, a popular command-line coding assistant, has made the new Sonnet its default model in their latest release.

Aider v0.60.0 release notes

Similarly, Cline (formerly known as Claude Dev) has also switched to the new Sonnet model as their primary choice.

Cline commit switching to new Claude 3.5 Sonnet

This widespread adoption by established development tools shows strong industry confidence in the model's enhanced abilities.

It remains to be seen how well the new Sonnet will perform in real-world coding tasks after this rapid adoption, and whether it will live up to the high expectations set by these early benchmarks.

Feedback and Code Quality

Real-world developer experiences with the new Sonnet model have varied. One developer on Reddit reported great results, processing 44.8M tokens in a single day while maintaining reliable code quality using Cline.

After a full day of coding today, with 44.8 MILLION tokens sent ($28), I have only had to warn it 3-4 times that is might be overwriting important code and it fixed it on the next generation.

However, some users have noted issues with long-form outputs, needing more back-and-forth to get desired results.

The new model however always stops in the middle of the output with something dumb and irrelevant like: Continuing without breaking, following the scenario's progression…; Continuing without stopping; Would you like me to proceed with writing the full story?; etc. and etc.

Several developers have found that while the model shows more creativity, it often needs clearer instructions to deliver the best results.

The new Sonnet model shows better instruction following and reasoning compared to older versions. It's particularly good at understanding complex code and suggesting improvements.

However, some developers have reported issues with being too myopic that sometimes need multiple tries to get right. The model's performance varies based on how well the instructions are written.

If it suggests a change to a function that makes other functions obsolete, it will not tell me that fact. If it suggests a change to function that requires 5 other functions to also change, it will not tell me that. This behavior is new.

Notable Improvements and Current Limitations

The upgraded Claude brings several main features and board improvements to the development process. Its enhanced model's ability in handling complex problems and the entire software development lifecycle shows promising advancement in AI-assisted development.

One groundbreaking new capability is computer use, making it the first frontier AI model to offer this feature in public beta. While this powerful tool opens new possibilities for software developers, some limitations still exist in handling complex tasks.

Some developers find they need to give more detailed instructions for complex code changes, and its performance can vary depending on the task.

Getting Better Results with 16x Prompt

Tools like 16x Prompt help developers get the most out of Claude 3.5 Sonnet. The app's code context system uses a tree structure to manage source code, making it easier to work with AI-assisted coding tasks. It works with both Claude.ai web interface and Claude via Anthropic API.

16x Prompt screenshot

16x Prompt also lets developers compare responses from different AI models side by side. This helps them find the best approach for their specific coding needs. Here's an example comparison of responses from the old Claude 3.5 Sonnet and the new Claude 3.5 Sonnet:

Comparison of responses from different AI models

Verdict

While the new Sonnet demonstrates significant improvements in code generation and complex instructions, its success relies heavily on proper implementation and clear use cases.

Nonetheless, the model's enhanced capabilities in handling real-world software engineering tasks position it as one of the most powerful LLMs available for coding tasks.

We recommend the new Claude 3.5 Sonnet for developers as their primary choice for AI-assisted coding tasks in 2024.


Download 16x Prompt

Join 5000+ users from tech companies, consulting firms, and agencies.