Anthropic pointed its most superior AI mannequin, Claude Opus 4.6, at manufacturing open-source codebases and located a plethora of safety holes: greater than 500 high-severity vulnerabilities that had survived a long time of professional evaluate and hundreds of thousands of hours of fuzzing, with every candidate vetted by inner and exterior safety evaluate earlier than disclosure.
Fifteen days later, the corporate productized the potential and launched Claude Code Safety.
Safety administrators chargeable for seven-figure vulnerability administration stacks ought to count on a standard query from their boards within the subsequent evaluate cycle. VentureBeat anticipates the emails and conversations will begin with, "How will we add reasoning-based scanning earlier than attackers get there first?", as a result of as Anthropic's evaluate discovered, merely pointing an AI mannequin at uncovered code could be sufficient to determine — and within the case of malicious actors, exploit — safety lapses in manufacturing code.
The reply issues greater than the quantity, and it’s primarily structural: how your tooling and processes allocate work between pattern-based scanners and reasoning-based evaluation. CodeQL and the instruments constructed on it match code in opposition to identified patterns.
Claude Code Safety, which Anthropic launched February 20 as a restricted analysis preview, causes about code the way in which a human safety researcher would. It follows how information strikes by an software and catches flaws in enterprise logic and entry management that no rule set covers.
The board dialog safety leaders must have this week
5 hundred newly found zero-days is much less a scare statistic than a standing funds justification for rethinking the way you fund code safety.
The reasoning functionality Claude Code Safety represents, and its inevitable rivals, must drive the procurement dialog. Static software safety testing (SAST) catches identified vulnerability courses. Reasoning-based scanners discover what pattern-matching was by no means designed to detect. Each have a job.
Anthropic printed the zero-day analysis on February 5. Fifteen days later, they shipped the product. Whereas it's the identical mannequin and capabilities, it’s now out there to Enterprise and Workforce prospects.
What Claude does that CodeQL couldn't
GitHub has provided CodeQL-based scanning by Superior Safety for years, and added Copilot Autofix in August 2024 to generate LLM-suggested fixes for alerts. Safety groups depend on it. However the detection boundary is the CodeQL rule set, and all the things exterior that boundary stays invisible.
Claude Code Safety extends that boundary by producing and testing its personal hypotheses about how information and management circulate by an software, together with instances the place no present rule set describes. CodeQL solves the issue it was constructed to unravel: data-flow evaluation inside predefined queries. It tells you whether or not tainted enter reaches a harmful operate.
CodeQL just isn’t designed to autonomously learn a undertaking's commit historical past, infer an incomplete patch, hint that logic into one other file, after which assemble a working proof-of-concept exploit finish to finish. Claude did precisely that on GhostScript, OpenSC, and CGIF, every time utilizing a unique reasoning technique.
"The actual shift is from pattern-matching to speculation technology," mentioned Merritt Baer, CSO at Enkrypt AI, advisor to Andesite and AppOmni, and former CISO at Reco, in an unique interview with VentureBeat. "That's a step-function improve in discovery energy, and it calls for equally sturdy human and technical controls."
Three proof factors from Anthropic's printed methodology present the place pattern-matching ends and speculation technology begins.
Commit historical past evaluation throughout information. GhostScript is a broadly deployed utility for processing PostScript and PDF information. Fuzzing turned up nothing, and neither did guide evaluation. Then Claude pulled the Git commit historical past, discovered a patch that added stack bounds checking for font dealing with in gstype1.c, and reversed the logic: if the repair was wanted there, each different name to that operate with out the repair was nonetheless weak. In gdevpsfx.c, a very completely different file, the decision to the identical operate lacked the bounds checking patched elsewhere. Claude constructed a working proof-of-concept crash. No CodeQL rule describes that bug right now. The maintainers have since patched it.
Reasoning about preconditions that fuzzers can't attain. OpenSC processes sensible card information. Customary approaches failed right here, too, so Claude searched the repository for operate calls which are ceaselessly weak and located a location the place a number of strcat operations ran in succession with out size checking on the output buffer. Fuzzers hardly ever reached that code path as a result of too many preconditions stood in the way in which. Claude reasoned about which code fragments regarded fascinating, constructed a buffer overflow, and proved the vulnerability.
Algorithm-level edge instances that no protection metric catches. CGIF is a library for processing GIF information. This vulnerability required understanding how LZW compression builds a dictionary of tokens. CGIF assumed compressed output would at all times be smaller than uncompressed enter, which is sort of at all times true. Claude acknowledged that if the LZW dictionary stuffed up and triggered resets, the compressed output might exceed the uncompressed measurement, overflowing the buffer. Even 100% department protection wouldn't catch this. The flaw calls for a specific sequence of operations that workouts an edge case within the compression algorithm itself. Random enter technology nearly by no means produces it. Claude did.
Baer sees one thing broader in that development. "The problem with reasoning isn't accuracy, it's company," she advised VentureBeat. "As soon as a system can type hypotheses and pursue them, you've shifted from a lookup device to one thing that may discover your atmosphere in methods which are tougher to foretell and constrain."
How Anthropic validated 500+ findings
Anthropic positioned Claude inside a sandboxed digital machine with commonplace utilities and vulnerability evaluation instruments. The crimson staff didn't present any specialised directions, customized harnesses, or task-specific prompting. Simply the mannequin and the code.
The crimson staff targeted on reminiscence corruption vulnerabilities as a result of they're the simplest to verify objectively. Crash monitoring and handle sanitizers don't go away room for debate. Claude filtered its personal output, deduplicating and reprioritizing earlier than human researchers touched something. When the confirmed depend saved climbing, Anthropic introduced in exterior safety professionals to validate findings and write patches.
Each goal was an open-source undertaking underpinning enterprise programs and important infrastructure. Small groups keep lots of them, staffed by volunteers, not safety professionals. When a vulnerability sits in considered one of these tasks for a decade, each product that pulls from it inherits the chance.
Anthropic didn't begin with the product launch. The defensive analysis spans greater than a 12 months. The corporate entered Claude in aggressive Seize-the-Flag occasions the place it ranked within the high 3% of PicoCTF globally, solved 19 of 20 challenges within the HackTheBox AI vs Human CTF, and positioned sixth out of 9 groups defending reside networks in opposition to human crimson staff assaults at Western Regional CCDC.
Anthropic additionally partnered with Pacific Northwest Nationwide Laboratory to check Claude in opposition to a simulated water therapy plant. PNNL's researchers estimated that the mannequin accomplished adversary emulation in three hours. The standard course of takes a number of weeks.
The twin-use query safety leaders can't keep away from
The identical reasoning that finds a vulnerability may also help an attacker exploit one. Frontier Crimson Workforce chief Logan Graham acknowledged this on to Fortune's Sharon Goldman. He advised Fortune the fashions can now discover codebases autonomously and comply with investigative leads sooner than a junior safety researcher.
Gabby Curtis, Anthropic's communications lead, advised VentureBeat in an unique interview the corporate constructed Claude Code Safety to make defensive capabilities extra broadly out there, "tipping the scales in direction of defenders." She was equally direct concerning the stress: "The identical reasoning that helps Claude discover and repair a vulnerability might assist an attacker exploit it, so we're being deliberate about how we launch this."
In interviews with greater than 40 CISOs throughout industries, VentureBeat discovered that formal governance frameworks for reasoning-based scanning instruments are the exception, not the norm. The most typical responses are that the world was thought of so nascent that many CISOs didn't assume this functionality would arrive so early in 2026.
The query each safety director has to reply earlier than deploying this: if I give my staff a device that finds zero-days by reasoning, have I unintentionally expanded my inner menace floor?
"You didn't weaponize your inner floor, you revealed it," Baer advised VentureBeat. "These instruments could be useful, however in addition they could floor latent danger sooner and extra scalably. The identical device that finds zero-days for protection can expose gaps in your menace mannequin. Remember that most intrusions don't come from zero-days, they arrive from misconfigurations."
"Along with the entry and assault path danger, there may be IP danger," she mentioned. "Not simply exfiltration, however transformation. Reasoning fashions can internalize and re-express proprietary insights in ways in which blur the road between use and leakage."
The discharge is intentionally constrained. Enterprise and Workforce prospects solely, by a restricted analysis preview. Open-source maintainers apply without spending a dime expedited entry. Findings undergo multi-stage self-verification earlier than reaching an analyst, with severity scores and confidence scores hooked up. Each patch requires human approval.
Anthropic additionally constructed detection into the mannequin itself. In a weblog submit detailing the safeguards, the corporate described deploying probes that measure activations throughout the mannequin because it generates responses, with new cyber-specific probes designed to trace potential misuse. On the enforcement aspect, Anthropic is increasing its response capabilities to incorporate real-time intervention, together with blocking visitors it detects as malicious.
Graham was direct with Axios: the fashions are extraordinarily good at discovering vulnerabilities, and he expects them to get a lot better nonetheless. VentureBeat requested Anthropic for the false-positive charge earlier than and after self-verification, the variety of disclosed vulnerabilities with patches landed versus nonetheless in triage, and the particular safeguards that distinguish attacker use from defender use. The lead researcher on the 500-vulnerability undertaking was unavailable, and the corporate declined to share particular attacker-detection mechanisms to keep away from tipping off menace actors.
"Offense and protection are converging in functionality," Baer mentioned. "The differentiator is oversight. When you can't audit and sure how the device is used, you've created one other danger."
That pace benefit doesn't favor defenders by default. It favors whoever adopts it first. Safety administrators who transfer early set the phrases.
Anthropic isn't alone. The sample is repeating.
Safety researcher Sean Heelan used OpenAI's o3 mannequin with no customized tooling and no agentic framework to find CVE-2025-37899, a beforehand unknown use-after-free vulnerability within the Linux kernel's SMB implementation. The mannequin analyzed over 12,000 strains of code and recognized a race situation that conventional static evaluation instruments constantly missed as a result of detecting it requires understanding concurrent thread interactions throughout connections.
Individually, AI safety startup AISLE found all 12 zero-day vulnerabilities introduced in OpenSSL's January 2026 safety patch, together with a uncommon high-severity discovering (CVE-2025-15467, a stack buffer overflow in CMS message parsing that’s doubtlessly remotely exploitable with out legitimate key materials). AISLE co-founder and chief scientist Stanislav Fort reported that his staff's AI system accounted for 13 of the 14 whole OpenSSL CVEs assigned in 2025. OpenSSL is among the many most scrutinized cryptographic libraries on the planet. Fuzzers have run in opposition to it for years. The AI discovered what they weren’t designed to search out.
The window is already open
These 500 vulnerabilities reside in open-source tasks that enterprise purposes depend upon. Anthropic is disclosing and patching, however the window between discovery and adoption of these patches is the place attackers function right now.
The identical mannequin enhancements behind Claude Code Safety can be found to anybody with API entry.
In case your staff is evaluating these capabilities, the restricted analysis preview is the suitable place to start out, with clearly outlined information dealing with guidelines, audit logging, and success standards agreed up entrance.

