Humanity released the Claude Sonnet 4.5 on Monday, calling it “the best coding model in the world” and released a new suite of developer tools along with the model. The company says the model can focus on complex, multi-step coding tasks for more than 30 hours, demonstrating the benefits of inference and mathematical capabilities.
Introducing Claude Sonnet 4.5 – The best coding model in the world.
It is the ultimate model for building complex agents. This is the perfect model for computer use. And it shows considerable profits in reasoning and mathematics tests. pic.twitter.com/7lwv9wpnav
– Claude (@claudei) September 29, 2025
According to human announcements, the model won 77.2% in SWE bench verification. This is a benchmark that measures actual software coding capabilities. Using parallel test time calculations increases its score to 82%. This will result in the new model being the best products from Openai and Google, as well as Anthropic's Claude 4.1 Opus (according to the company's naming scheme, Haiku is a small model, Sonnet is a medium size, and Opus is the heaviest and most powerful model in the family).

Image: Humanity
Claude Sonnet 4.5 leads Osworld, a benchmark test that tests AI models in real computer tasks, earning 61.4%. Four months ago, Claude Sonnet 4 held the lead at 42.2%. This model demonstrates improved features across inference and mathematics benchmarks, as well as experts in specific business fields such as finance, law, and medicine.
I tried the model, but the first quick test allowed me to generate a regular “AI vs. journalist” game, repeating, tweaking, or retrying the zero-shot prompt. This model generated feature code faster than the Claude 4.1 Opus, while maintaining the highest quality output. The application this created showed a visual polish comparable to Openai's output. This is usually a change from the previous Claude version that generates sophisticated interfaces.

Humanity has released several new features in the model. Claude code now includes checkpoints. This saves progress and allows users to roll back to their previous state. The company updated its terminal interface and shipped native vs. code extensions. The Claude API has gained context editing capabilities and memory tools that allow agents to run longer and handle more complexity. The Claude app now includes code execution and file creation for spreadsheets, slides, and documents directly in conversation.
Prices from Claude Sonnet 4 remain unchanged at $3 per input token and $15 per million output token. All Claude code updates are available to all users, while all developers are available to updates to the Claude developer platform, including the Agent SDK.
Also known as Claude Sonnet 4.5, “Our Most Aligned Frontier Model,” he said it has significantly improved its reductions in behavior, such as psychofancy, deception, power seeking, and encouragement of delusional thinking. The company also said it made progress by defending against a rapid injected attack. This was identified as one of the most serious risks for users of agent and computer use capabilities.
Of course, it took Pliny, the world's most famous AI prompt engineer, to jailbreak it and produce a drug recipe like the most ordinary in the world.
gg pic.twitter.com/vfnveoiyb
– Pliny the Rebator🐉و٠��وټكلووك(@elder_plinius) September 29, 2025
This release occurs as competition intensifies among AI companies due to coding capabilities. Openai released the GPT-5 last month, but Google's models compete in a variety of benchmarks. This can be shocking for some forecast markets. The forecast market was almost entirely certain that Gemini would be the best model of the month until a few hours ago.
It could be a competition against time. Currently, the model is not shown in the rankings, but LM Arena has already announced that it can be used in the rankings. Depending on the number of interactions, tomorrow's results can be very surprising. Given that Claude 4.1Opus is second, and the Claude 4.5 sonnet is far better.
Humanity is also releasing a temporary survey preview called “Imagine With Claude,” which is available to the largest subscribers for five days. In experiments, Claude generates software on the fly without pre-determined features or pre-written code, responding to requests when users interact and adapting.
“What you see is Claude creates it in real time,” the company said. Humanity described the model as a demonstration of what is possible when combining it with the right infrastructure.