Is Anthropic 'nerfing' Claude? Customers more and more report efficiency degradation as leaders push again

A rising variety of builders and AI energy customers are taking to social media to accuse Anthropic of degrading the efficiency of Claude Opus 4.6 and Claude Code — deliberately or as an end result of compute limits — arguing that the corporate’s flagship coding mannequin feels much less succesful, much less dependable and extra wasteful with tokens than it did simply weeks in the past.

The complaints have unfold shortly on Github, X and Reddit over the previous a number of weeks, with a number of high-reach posts alleging that Claude has grow to be worse at sustained reasoning, extra more likely to abandon duties halfway by means of, and extra liable to hallucinations or contradictions.

Some customers have framed the difficulty as “AI shrinkflation” — the concept that prospects are paying the identical value for a weaker product.

Others have gone additional, suggesting Anthropic could also be throttling or in any other case tuning Claude downward during times of heavy demand.

These claims stay unproven, and Anthropic staff have publicly denied that the corporate degrades fashions to handle capability. On the identical time, Anthropic has acknowledged actual modifications to utilization limits and reasoning defaults in current weeks, which has made the broader debate extra flamable.

VentureBeat has reached out to Anthropic for additional clarification on the current accusations, together with whether or not any current modifications to reasoning defaults, context dealing with, throttling conduct, inference parameters or benchmark methodology may assist clarify the spike in complaints.

Now we have additionally requested how Anthropic explains the current benchmark-related claims and whether or not it plans to publish further information that might reassure prospects. An Anthropic spokesperson didn’t deal with the questions individually, as a substitute referring us to X posts by Claude Code creator Boris Cherny and Claude Code group member Thariq Shihipar relating to Opus 4.6 efficiency and utilization limits, respectively. Each X posts are additionally referenced and linked under.

Viral person complaints, together with from an AMD Senior Director, argue Claude has grow to be much less succesful

Probably the most detailed public complaints originated as a GitHub challenge filed by Stella Laurenzo on April 2, 2026, whose LinkedIn profile identifies her as Senior Director in AMD’s AI group.

In that submit, Laurenzo wrote that Claude Code had regressed to the purpose that it couldn’t be trusted for advanced engineering work, then backed that declare with a sprawling evaluation of 6,852 Claude Code session information, 17,871 pondering blocks and 234,760 device calls.

The grievance argued that, beginning in February, Claude’s estimated reasoning depth fell sharply whereas indicators of poorer efficiency rose alongside it, together with extra untimely stopping, extra “easiest repair” conduct, extra reasoning loops, and a measurable shift from research-first conduct to edit-first conduct.

The submit’s broader level was that for superior engineering workflows, prolonged reasoning will not be a luxurious however a part of what makes the mannequin usable within the first place.

That GitHub thread then escaped into the broader social media dialog, with X customers together with @Hesamation, who posted screenshots of Laurenzo's GitHub submit to X on April 11, turning it into an much more viral speaking level.

That amplification mattered as a result of it gave the broader “Claude is getting worse” narrative one thing extra concrete than anecdotal frustration: a protracted, data-heavy submit from a senior AI chief at a serious chip firm arguing that the regression was seen in logs, tool-use patterns and person corrections, not simply intestine feeling.

Anthropic’s public response targeted on separating perceived modifications from precise mannequin degradation. In a pinned follow-up on the identical GitHub challenge posted every week in the past, Claude Code lead Boris Cherny thanked Laurenzo for the care and depth of the evaluation however disputed its principal conclusion.

Cherny mentioned the “redact-thinking-2026-02-12” header cited within the grievance is a UI-only change that hides pondering from the interface and reduces latency, however “doesn’t impression pondering itself,” “pondering budgets,” or how prolonged reasoning works below the hood.

He additionally mentioned two different product modifications seemingly affected what customers had been seeing: Opus 4.6’s transfer to adaptive pondering by default on Feb. 9, and a March 3 shift to medium effort, or effort stage 85, because the default for Opus 4.6, which he mentioned Anthropic seen as the most effective stability throughout intelligence, latency and value for many customers.

Cherny added that customers who need extra prolonged reasoning can manually swap effort larger by typing /effort excessive in Claude Code terminal classes.

That alternate will get on the core of the controversy. Critics like Laurenzo argue that Claude’s conduct in demanding coding workflows has plainly worsened and level to logs and utilization patterns as proof.

Anthropic, against this, will not be saying nothing modified. It’s saying the largest current modifications had been product and interface selections that have an effect on what customers see and the way a lot effort the system expends by default, not a secret downgrade of the underlying mannequin. That distinction could also be technically necessary, however for energy customers who really feel the product is delivering worse outcomes, it’s not essentially a satisfying one.

Exterior protection from TechRadar and PC Gamer additional amplified Laurenzo's submit and bigger wave of settlement from some energy customers.

One other viral submit on X from developer Om Patel on April 7 made the identical argument in much more direct phrases, claiming that somebody had “truly measured” how a lot “dumber” Claude had gotten and summarizing the outcome as a 67% drop.

That submit helped popularize the “AI shrinkflation” label and pushed the controversy past hard-core Claude Code customers into the broader AI discourse on X.

These claims have resonated as a result of they map carefully onto what many annoyed customers say they’re seeing in apply: extra unfinished duties, extra backtracking, extra token burn and a stronger sense that Claude is much less prepared to motive deeply by means of sophisticated coding jobs than it was earlier this yr.

Benchmark posts turned anecdotal frustration right into a public controversy

The loudest benchmark-based declare got here from BridgeMind, which runs the BridgeBench hallucination benchmark. On April 12, the account posted that Claude Opus 4.6 had fallen from 83.3% accuracy and a No. 2 rating in an earlier outcome to 68.3% accuracy and No. 10 in a brand new retest, calling that proof that “Claude Opus 4.6 is nerfed.”

That submit unfold extensively and have become one of many principal anchors for the broader public case that Anthropic had degraded the mannequin.

Different customers additionally circulated benchmark-related or test-based posts suggesting that Opus 4.6 was underperforming versus Opus 4.5 in sensible coding duties.

Nonetheless different posts pointed to TerminalBench-related outcomes as supposed proof that the mannequin’s conduct had modified in sure harnesses or product contexts.

The impact was cumulative: benchmark screenshots, side-by-side assessments and anecdotal frustration all started reinforcing each other in public.

That issues as a result of benchmark claims are likely to journey farther than extra subjective complaints. A developer saying a mannequin “feels worse” is one factor. A screenshot displaying a rating drop from No. 2 to No. 10, or a dramatic proportion swing in accuracy, offers the looks of laborious proof, even when the underlying comparability could also be extra sophisticated.

Critics of the benchmark claims say the proof is weaker than it seems

A very powerful rebuttal to the BridgeBench declare didn’t come from Anthropic. It got here from Paul Calcraft, an out of doors software program and AI researcher on X, who argued that the viral comparability was deceptive as a result of the sooner Opus 4.6 outcome was primarily based on solely six duties whereas the later one was primarily based on 30.

In his phrases, it was a “DIFFERENT BENCHMARK.” He additionally mentioned that on the six duties the 2 runs shared in frequent, Claude’s rating moved solely modestly, from 87.6% beforehand to 85.4% within the later run, and that the larger swing appeared to return largely from a single fabrication outcome with out repeats. He characterised that as one thing that might simply fall inside atypical statistical noise.

That exterior rebuttal issues as a result of it undercuts one of many cleanest and most viral claims in circulation. It doesn’t show customers are mistaken to assume one thing has modified. But it surely does recommend that at the very least among the benchmark proof now driving the story could also be overstated, poorly normalized or indirectly comparable.

Even the BridgeBench submit itself drew a neighborhood be aware to comparable impact. The be aware mentioned the 2 benchmark runs coated completely different scopes — six duties in a single case and 30 within the different — and that the common-task subset confirmed solely a minor change. That doesn’t make the later outcome meaningless, but it surely weakens the strongest model of the “BridgeBench proved it” argument.

That is now a key characteristic of the controversy: the claims will not be all equally robust. Some are grounded in first-hand person expertise. Some level to actual product modifications. Some depend on benchmark comparisons that might not be apples-to-apples. And a few depend upon inferences about hidden system conduct that customers exterior Anthropic can not straight confirm.

Earlier capability limits gave customers a motive to suspect extra modifications below the hood

The present backlash additionally lands within the shadow of an actual, confirmed Anthropic coverage change from late March. On March 26, Anthropic technical staffer Thariq Shihipar posted that, “To handle rising demand for Claude,” the corporate was adjusting how 5-hour session limits work for Free, Professional and Max subscribers throughout peak hours, whereas protecting weekly limits unchanged.

He added that in weekdays from 5 a.m. to 11 a.m. Pacific time, customers would transfer by means of their 5-hour session limits sooner than earlier than. In follow-up posts, he mentioned Anthropic had landed effectivity wins to offset among the impression, however that roughly 7% of customers would hit session limits they’d not have hit earlier than, notably on Professional tiers.

In an e-mail on March 27, 2026, Anthropic informed VentureBeat that Crew and Enterprise prospects weren’t affected by these modifications, and that the shift was not dynamically optimized per person however as a substitute utilized to the peak-hour window the corporate had publicly described. Anthropic additionally mentioned it was persevering with to put money into scaling capability.

These feedback had been about session limits, not mannequin downgrades. However they’re necessary context, as a result of they set up two issues that customers now hold connecting in public: first, Anthropic has been coping with surging demand; second, it has already modified how utilization is rationed throughout busy durations. That doesn’t show Anthropic lowered mannequin high quality. It does assist clarify why so many customers are primed to consider one thing else can also have modified.

Immediate caching and TTL

A separate, newer GitHub challenge broadens the dispute past mannequin high quality and into pricing and quota conduct. In challenge #46829, person seanGSISG argued that Claude Code’s prompt-cache time-to-live, or TTL, appeared to shift from a one-hour setting again to a five-minute setting in early March, primarily based on evaluation of almost 120,000 API calls drawn from Claude Code session logs throughout two machines.

The grievance argues that this variation drove significant will increase in cache-creation prices and quota burn, particularly for long-running coding classes the place cached context expires shortly and should be rebuilt. The writer claims that this helps clarify why some subscription customers started hitting utilization limits that they had not beforehand encountered.

What makes this challenge notable is that Anthropic didn’t flatly deny that one thing modified. In a reply on the thread, Jarred Sumner mentioned the March 6 change was actual and intentional, however rejected the framing that it was a regression. He mentioned Claude Code makes use of completely different cache durations for various request varieties, and that one-hour cache will not be all the time cheaper as a result of one-hour writes price extra up entrance and solely get monetary savings when the identical cached context is reused sufficient occasions to justify it.

In his telling, the change was a part of ongoing cache optimization work, not a silent downgrade, and the pre–March 6 conduct described within the challenge “wasn’t the supposed regular state.”

The thread later drew a extra detailed response from Anthropic’s Cherny, who described one-hour caching as “nuanced” and mentioned the corporate has been testing heuristics to enhance cache hit charges, token utilization and latency for subscribers. Cherny mentioned Anthropic retains five-minute cache for a lot of queries, together with subagents which can be hardly ever resumed, and mentioned turning off telemetry additionally disables experiment gates, which may trigger Claude Code to fall again to a five-minute default in some circumstances.

He added that Anthropic plans to reveal atmosphere variables that allow customers drive one-hour or five-minute cache conduct straight. Collectively, these replies don’t validate the difficulty writer’s declare that Anthropic silently made Claude Code costlier general, however they do verify that Anthropic has been actively experimenting with cache conduct behind the scenes throughout the identical interval customers started complaining extra loudly about quota burn and altering product conduct.

Anthropic says user-facing modifications, not secret degradation, clarify a lot of the uproar

Anthropic-affiliated staff have publicly pushed again on the broadest accusations. In a single extensively circulated reply on X, Cherny responded to claims that Anthropic had secretly nerfed Claude Code by writing, “That is false.”

He mentioned Claude Code had been defaulted to medium effort in response to person suggestions that Claude was consuming too many tokens, and that the change had been disclosed each within the changelog and in a dialog proven to customers once they opened Claude Code.

That response is notable as a result of it concedes a significant product change whereas rejecting the extra conspiratorial interpretation of it. Anthropic will not be saying nothing modified. It’s saying that what modified was disclosed and was geared toward balancing token use, not secretly lowering mannequin high quality.

Public documentation additionally helps the truth that effort defaults have been in movement. Claude Code’s changelog says that on April 7, Anthropic modified the default effort stage from medium to excessive for API-key customers in addition to Bedrock, Vertex, Foundry, Crew and Enterprise customers.

That implies Anthropic has actively been tuning these settings throughout completely different segments, which may plausibly have an effect on person perceptions even when the core mannequin weights are unchanged.

Shihipar has additionally straight denied the broader demand-management accusation. In a reply on X posted April 11, he mentioned Anthropic doesn’t “degrade” its fashions to higher serve demand. He additionally mentioned that modifications to pondering summaries affected how some customers had been measuring Claude’s “pondering,” and that the corporate had not discovered proof backing the strongest qualitative claims now spreading on-line.

The true challenge could also be belief as a lot as mannequin high quality

What is evident is {that a} belief hole has opened between Anthropic and a few of its most demanding customers.

For builders who depend on Claude Code all day, refined shifts in seen pondering output, effort defaults, token burn, latency tradeoffs or utilization caps can really feel indistinguishable from a weaker mannequin.

That’s true whether or not the basis trigger is a product setting, a UI change, an inference-policy tweak, capability strain or a real high quality regression.

It additionally means either side of the combat could also be speaking previous one another. Customers are describing what they expertise: extra friction, extra failures and fewer confidence. Anthropic is responding in product phrases: effort defaults, hidden pondering summaries, changelog disclosures, and denials that demand strain is inflicting secret mannequin degradation.

These will not be essentially incompatible descriptions. A mannequin can really feel worse to customers even when the corporate believes it has not “nerfed” the underlying mannequin in the best way critics allege. However coming at a time when Anthropic's chief rival OpenAI has not too long ago pivoted and put extra assets behind its competing, enterprise and vibe-coding targeted product Codex — even providing a brand new, extra mid-range ChatGPT subscription in an effort to spice up utilization of the device — it's actually not the type of publicity that stands to learn Anthropic or its buyer retention.

On the identical time, the general public proof stays combined. A few of the most viral claims have come from builders with detailed logs and powerful opinions primarily based on repeated use. A few of the benchmark proof has been challenged by exterior observers on methodological grounds. And Anthropic’s personal current modifications to limits and settings be certain that this debate is going on in opposition to a backdrop of actual changes, not pure rumor.

What's Hot

The Electrical Ferrari Luce Is Lastly Right here

DARPA readies robotic deep-space restore satellite tv for pc for 2026 launch

The Witches of Luigi Mangione

Is Anthropic 'nerfing' Claude? Customers more and more report efficiency degradation as leaders push again

The Electrical Ferrari Luce Is Lastly Right here

Pattern Micro customers beware – harmful Apex One zero-day exploited within the wild

The Virgin Unicorns – GeekWire

A 0.12% parameter add-on provides AI brokers the working reminiscence RAG can't

The Electrical Ferrari Luce Is Lastly Right here

DARPA readies robotic deep-space restore satellite tv for pc for 2026 launch

The Witches of Luigi Mangione

Patti LaBelle’s ‘dwelling it down’ method to ageing

Latest Posts

The Electrical Ferrari Luce Is Lastly Right here

DARPA readies robotic deep-space restore satellite tv for pc for 2026 launch

The Witches of Luigi Mangione

What's Hot

Is Anthropic 'nerfing' Claude? Customers more and more report efficiency degradation as leaders push again

Viral person complaints, together with from an AMD Senior Director, argue Claude has grow to be much less succesful

Benchmark posts turned anecdotal frustration right into a public controversy

Critics of the benchmark claims say the proof is weaker than it seems

Earlier capability limits gave customers a motive to suspect extra modifications below the hood

Immediate caching and TTL

Anthropic says user-facing modifications, not secret degradation, clarify a lot of the uproar

The true challenge could also be belief as a lot as mannequin high quality

Related Posts