Claude 3.7 vs 3.5 Sonnet for Coding - Which One Should You Use?

Posted on March 20, 2025, updated on March 25, 2025, by Zhu Liang

With the recent release of Claude 3.7 Sonnet on 25 February 2025, developers are testing its performance against the proven Claude 3.5 Sonnet for coding tasks. Let's look at the key differences and experiences from the developer community.

Initial Excitement with Claude 3.7 Sonnet

On the day of the release (25 February), developers were excited about Claude 3.7 Sonnet's capabilities. According to Reddit user Ehsan1238, Claude 3.7 showed impressive abilities with complex code. The model could complete in one go what typically took days of work.

Reddit post 1

The model shows particular strength in handling complex UI and backend code at the same time. This ability suggests big improvements in understanding and creating complex system designs.

Claude 3.7 Sonnet Over Engineering

However, as more developers started using Claude 3.7, some noticed issues with the model.

@thekitze on X illustrated the difference between versions 3.5 and 3.7, showing how 3.7 tends to be overenthusiastic and goes beyond the original request.

Thekitze X post

While 3.5 simply completes the task, 3.7 adds extra features and even suggests unrelated improvements.

Comparison of Claude 3.5 and 3.7

For everyday coding tasks, some developers still prefer Claude 3.5 Sonnet. According to developer @mayfer on X / Twitter on 20 March, Claude 3.5 Sonnet is still better than Claude 3.7 Sonnet for coding tasks.

mayfer X post

Other new models like o1 are "not good as a daily tool".

mayfer X post 2

Similarly, @SeifBassam on X shared:

Is anyone else finding that Claude Sonnet 3.5 is better than 3.7 for coding?

I reverted and am getting better results

Seif Bassam X post

Based on these experiences, both developers have found Claude 3.5 Sonnet to be more effective for their coding tasks, with @mayfer noting it's better than 3.7 for coding, and @SeifBassam reporting improved results after reverting to 3.5.

Human Evaluated Coding Benchmark

A recently released human evaluated coding benchmark called KCORES LLM Arena compared the performance of top LLMs on a set coding tasks and used human evaluators to score the results against a set of evaluation criteria.

KCORES LLM Arena

The results are as follows:

  • 1st place: Claude 3.7 Sonnet Thinking - 334.8
  • 2nd place: Claude 3.5 Sonnet - 330.8
  • 3rd place: DeepSeek-V3-0324 (New version) - 328.3
  • 4th place: Claude 3.7 Sonnet - 322.3

This shows Claude 3.5 Sonnet is better than the non-thinking version of Claude 3.7 Sonnet, but worse than Claude 3.7 Sonnet Thinking.

3.7 Unstoppable Chain of Actions

Some developers have reported issues with Claude 3.7's tendency to continue modifying code beyond the original request. According to Reddit user stxthrowaway123 who posted the observation in Cursor subreddit, Claude 3.7 can be "basically unusable" due to its inability to stop its chain of actions:

It's like it has no ability to stop its chain of actions. It will attempt to solve my original prompt, and then it will come across irrelevant code and start changing that code, claiming that it has found an error. At the end of its actions, it has created a mess.

This behavior has led some developers to switch back to Claude 3.5, which they find more focused and controlled in its responses.

Taming Claude 3.7 Sonnet

To get the best results from Claude 3.7, Reddit user Old_Round_4514 suggests starting slow and being very clear with instructions.

Reddit post 3

It's best to go with the flow and work with the model. Check the output after letting the model do its work:

Enjoy that ride rather than fight it and you will get the best out of it, not always, but when its good its very very good.

According to Reddit user vanderpyyy's post, the model tends to make things too complex.

Reddit post 2

A helpful fix is adding "Use as few lines of code as possible" to custom instructions. This simple change has led to much better and simpler code outputs.

Recommendations for Developers

For most coding tasks, Claude 3.5 Sonnet seems to be the more reliable choice. It gives more consistent results and needs less prompt engineering to work well.

Use Claude 3.7 Sonnet for complex design tasks or when you need new ideas for tough problems. But be ready to spend more time writing clear prompts and managing the model's tendency to make things complex.

Tools for Optimizing Claude

Tools like 16x Prompt can help you get better results from Claude 3.5 or 3.7 Sonnet. The app has features to help you manage code context and built-in custom instructions specific to Claude 3.7 Sonnet.

16x Prompt Interface

You can also use 16x Prompt to compare the responses of Claude 3.5 and 3.7 Sonnet side by side yourself. Here's a screenshot showcasing the comparison:

Screenshot comparison of Claude 3.5 and 3.7 Sonnet

16x Prompt's code context system makes it easier to give relevant context to the model. You can copy the final prompt to the Claude website, or send it directly through the Anthropic API.


Download 16x Prompt

Join 7000+ users from tech companies, consulting firms, and agencies.