Anthropic and OpenAI simply uncovered SAST's structural blind spot with free instruments

OpenAI launched Codex Safety on March 6, coming into the applying safety market that Anthropic had disrupted 14 days earlier with Claude Code Safety. Each scanners use LLM reasoning as an alternative of sample matching. Each proved that conventional static utility safety testing (SAST) instruments are structurally blind to total vulnerability courses. The enterprise safety stack is caught within the center.

Anthropic and OpenAI independently launched reasoning-based vulnerability scanners, and each discovered bug courses that pattern-matching SAST was by no means designed to detect. The aggressive stress between two labs with a mixed private-market valuation exceeding $1.1 trillion means detection high quality will enhance quicker than any single vendor can ship alone.

Neither Claude Code Safety nor Codex Safety replaces your current stack. Each instruments change procurement math completely. Proper now, each are free to enterprise prospects. The pinnacle-to-head comparability and 7 actions beneath are what you want earlier than the board of administrators asks which scanner you’re piloting and why.

How Anthropic and OpenAI reached the identical conclusion from totally different architectures

Anthropic printed its zero-day analysis on February 5 alongside the discharge of Claude Opus 4.6. Anthropic stated Claude Opus 4.6 discovered greater than 500 beforehand unknown high-severity vulnerabilities in manufacturing open-source codebases that had survived a long time of professional overview and tens of millions of hours of fuzzing.

Within the CGIF library, Claude found a heap buffer overflow by reasoning concerning the LZW compression algorithm, a flaw that coverage-guided fuzzing couldn’t catch even with 100% code protection. Anthropic shipped Claude Code Safety as a restricted analysis preview on February 20, obtainable to Enterprise and Staff prospects, with free expedited entry for open-source maintainers. Gabby Curtis, Anthropic’s communications lead, instructed VentureBeat in an unique interview that Anthropic constructed Claude Code Safety to make defensive capabilities extra extensively obtainable.

OpenAI’s numbers come from a special structure and a wider scanning floor. Codex Safety developed from Aardvark, an inside device powered by GPT-5 that entered personal beta in 2025. Throughout the Codex Safety beta interval, OpenAI’s agent scanned greater than 1.2 million commits throughout exterior repositories, surfacing what OpenAI stated have been 792 important findings and 10,561 high-severity findings. OpenAI reported vulnerabilities in OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, and Chromium, leading to 14 assigned CVEs. Codex Safety’s false constructive charges fell greater than 50% throughout all repositories throughout beta, based on OpenAI. Over-reported severity dropped greater than 90%.

Checkmarx Zero researchers demonstrated that reasonably difficult vulnerabilities typically escaped Claude Code Safety’s detection. Builders may trick the agent into ignoring weak code. In a full production-grade codebase scan, Checkmarx Zero discovered that Claude recognized eight vulnerabilities, however solely two have been true positives. If reasonably advanced obfuscation defeats the scanner, the detection ceiling is decrease than the headline numbers recommend. Neither Anthropic nor OpenAI has submitted detection claims to an impartial third-party audit. Safety leaders ought to deal with the reported numbers as indicative, not audited.

Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, instructed VentureBeat that the aggressive scanner race compresses the window for everybody. Baer suggested safety groups to prioritize patches primarily based on exploitability of their runtime context relatively than CVSS scores alone, shorten the window between discovery, triage, and patch, and preserve software program invoice of supplies visibility so that they know immediately the place a weak element runs.

Totally different strategies, virtually no overlap within the codebases they scanned, but the identical conclusion. Sample-matching SAST has a ceiling, and LLM reasoning extends detection previous it. When two competing labs distribute that functionality on the similar time, the dual-use math will get uncomfortable. Any monetary establishment or fintech operating a business codebase ought to assume that if Claude Code Safety and Codex Safety can discover these bugs, adversaries with API entry can discover them, too.

Baer put it bluntly: open-source vulnerabilities surfaced by reasoning fashions ought to be handled nearer to zero-day class discoveries, not backlog gadgets. The window between discovery and exploitation simply compressed, and most vulnerability administration applications are nonetheless triaging on CVSS alone.

What the seller responses show

Snyk, the developer safety platform utilized by engineering groups to seek out and repair vulnerabilities in code and open-source dependencies, acknowledged the technical breakthrough however argued that discovering vulnerabilities has by no means been the arduous half. Fixing them at scale, throughout lots of of repositories, with out breaking something. That’s the bottleneck. Snyk pointed to analysis displaying AI-generated code is 2.74 instances extra prone to introduce safety vulnerabilities in comparison with human-written code, based on Veracode’s 2025 GenAI Code Safety Report. The identical fashions discovering lots of of zero-days additionally introduce new vulnerability courses after they write code.

Cycode CTO Ronen Slavin wrote that Claude Code Safety represents a real technical development in static evaluation, however that AI fashions are probabilistic by nature. Slavin argued that safety groups want constant, reproducible, audit-grade outcomes, and {that a} scanning functionality embedded in an IDE is beneficial however doesn’t represent infrastructure. Slavin’s place: SAST is one self-discipline inside a much wider scope, and free scanning doesn’t displace platforms that deal with governance, pipeline integrity, and runtime conduct at enterprise scale.

“If code reasoning scanners from main AI labs are successfully free to enterprise prospects, then static code scanning commoditizes in a single day,” Baer instructed VentureBeat. Over the following 12 months, Baer expects the funds to maneuver towards three areas.

Runtime and exploitability layers, together with runtime safety and assault path evaluation.
AI governance and mannequin safety, together with guardrails, immediate injection defenses, and agent oversight.
Remediation automation. “The online impact is that AppSec spending in all probability doesn’t shrink, however the middle of gravity shifts away from conventional SAST licenses and towards tooling that shortens remediation cycles,” Baer stated.

Seven issues to do earlier than your subsequent board assembly

Run each scanners towards a consultant codebase subset. Evaluate Claude Code Safety and Codex Safety findings towards your current SAST output. Begin with a single consultant repository, not your total codebase. Each instruments are in analysis preview with entry constraints that make full-estate scanning untimely. The delta is your blind spot stock.
Construct the governance framework earlier than the pilot, not after. Baer instructed VentureBeat to deal with both device like a brand new knowledge processor for the crown jewels, which is your supply code. Baer’s governance mannequin features a formal data-processing settlement with clear statements on coaching exclusion, knowledge retention, and subprocessor use, a segmented submission pipeline so solely the repos you plan to scan are transmitted, and an inside classification coverage that distinguishes code that may depart your boundary from code that can’t. In interviews with greater than 40 CISOs, VentureBeat discovered that formal governance frameworks for reasoning-based scanning instruments barely exist but. Baer flagged derived IP because the blind spot most groups haven’t addressed. Can mannequin suppliers retain embeddings or reasoning traces, and are these artifacts thought of your mental property? The opposite hole is knowledge residency for code, which traditionally was not regulated like buyer knowledge however more and more falls underneath export management and nationwide safety overview.
Map what neither device covers. Software program composition evaluation. Container scanning. Infrastructure-as-code. DAST. Runtime detection and response. Claude Code Safety and Codex Safety function on the code-reasoning layer. Your current stack handles every thing else. That stack’s pricing energy is what shifted.
Quantify the dual-use publicity. Each zero-day Anthropic and OpenAI surfaced lives in an open-source challenge that enterprise functions rely on. Each labs are disclosing and patching responsibly, however the window between their discovery and your adoption of these patches is strictly the place attackers function. AI safety startup AISLE independently found all 12 zero-day vulnerabilities in OpenSSL’s January 2026 safety patch, together with a stack buffer overflow (CVE-2025-15467) that’s probably remotely exploitable with out legitimate key materials. Fuzzers ran towards OpenSSL for years and missed each one. Assume adversaries are operating the identical fashions towards the identical codebases.
Put together the board comparability earlier than they ask. Claude Code Safety causes about code contextually, traces knowledge flows, and makes use of multi-stage self-verification. Codex Safety builds a project-specific risk mannequin earlier than scanning and validates findings in sandboxed environments. Every device is in analysis preview and requires human approval earlier than any patch is utilized. The board wants side-by-side evaluation, not a single-vendor pitch. When the dialog turns to why your current suite missed what Anthropic discovered, Baer provided framing that works on the board stage. Sample-matching SAST solved a special era of issues, Baer instructed VentureBeat. It was designed to detect recognized anti-patterns. That functionality nonetheless issues and nonetheless reduces danger. However reasoning fashions can consider multi-file logic, state transitions, and developer intent, which is the place many fashionable bugs dwell. Baer’s board-ready abstract: “We purchased the proper instruments for the threats of the final decade; the expertise simply superior.”
Monitor the aggressive cycle. Each corporations are heading towards IPOs, and enterprise safety wins drive the expansion narrative. When one scanner misses a blind spot, it lands on the opposite lab’s function roadmap inside weeks. Each labs ship mannequin updates on month-to-month cycles. That cadence will outrun any single vendor’s launch calendar. Baer stated that operating each is the proper transfer: “Totally different fashions motive in a different way, and the delta between them can reveal bugs neither device alone would constantly catch. Within the quick time period, utilizing each isn’t redundancy. It’s protection by means of range of reasoning techniques.”
Set a 30-day pilot window. Earlier than February 20, this check didn’t exist. Run Claude Code Safety and Codex Safety towards the identical codebase and let the delta drive the procurement dialog with empirical knowledge as an alternative of vendor advertising and marketing. Thirty days provides you that knowledge.

Fourteen days separated Anthropic and OpenAI. The hole between the following releases can be shorter. Attackers are watching the identical calendar.

What's Hot

Unique Interview with Abbey Anderson

Standard, Inc. (BPOP) Presents at RBC Capital Markets International Monetary Establishments Convention 2026 Transcript

Doug Ford to Expropriate Toronto Island Airport for Jet Enlargement

Anthropic and OpenAI simply uncovered SAST's structural blind spot with free instruments

See the 163 new emoji obtainable in iOS 26.4 beta 4

I Used Google’s New Gemini-Powered ‘Assist Me Create’ Software in Docs. It’s Nice at Company-Converse

OpenAI’s head of robotics resigns over Pentagon deal, warning about surveillance and deadly autonomy

Jay Graber steps down as Bluesky CEO, strikes into chief innovation officer position at social media platform

Unique Interview with Abbey Anderson

Standard, Inc. (BPOP) Presents at RBC Capital Markets International Monetary Establishments Convention 2026 Transcript

Doug Ford to Expropriate Toronto Island Airport for Jet Enlargement

Age of Attraction’s Libby Hints Older Man Reminds Her of Dad

Latest Posts

Unique Interview with Abbey Anderson

Standard, Inc. (BPOP) Presents at RBC Capital Markets International Monetary Establishments Convention 2026 Transcript

Doug Ford to Expropriate Toronto Island Airport for Jet Enlargement

What's Hot

Anthropic and OpenAI simply uncovered SAST's structural blind spot with free instruments

How Anthropic and OpenAI reached the identical conclusion from totally different architectures

What the seller responses show

Seven issues to do earlier than your subsequent board assembly

Related Posts