Microsoft's multi-agent AI system tops Anthropic's Mythos on cybersecurity benchmark

by Todd Bishop on Could 13, 2026 at 5:16 pmCould 13, 2026 at 5:23 pm

CyberGym benchmark scores over time, exhibiting the speedy enchancment in AI vulnerability discovery capabilities. Microsoft’s multi-model MDASH system (high proper) tops the leaderboard at 88.4%. (CyberGym / UC Berkeley)

Mythos has been MDASH’d.

A brand new AI-powered system from Microsoft surpassed a headline-grabbing rival from Anthropic on a number one cybersecurity benchmark, utilizing greater than 100 specialised AI brokers working collectively throughout a number of AI fashions to seek out real-world software program vulnerabilities.

Microsoft’s system, codenamed MDASH, was launched this week alongside the disclosure of 16 new vulnerabilities it discovered in numerous variations of Home windows, together with 4 “important” distant code execution flaws fastened on this month’s Patch Tuesday launch.

The corporate, which has confronted persistent criticism over safety lapses, is betting that a number of fashions can uncover vulnerabilities at a tempo that particular person fashions can’t match.

MDASH, derived from the time period “multi-model agentic scanning harness,” works by working specialised AI brokers by means of a staged pipeline. Totally different brokers scan code for potential vulnerabilities, then a separate set of brokers debate whether or not every discovering is actual and exploitable, and a ultimate stage constructs proof-of-concept assaults to substantiate the bugs exist.

By comparability, Anthropic’s Mythos, which raised considerations over its capacity to seek out and exploit software program vulnerabilities when it was previewed earlier this 12 months, is a single AI mannequin working inside an agent framework. Anthropic restricted its launch to a handful of firms by means of a consortium known as Challenge Glasswing, which incorporates Microsoft.

OpenAI’s GPT-5.5 and others on the leaderboard are additionally single-model methods.

MDASH scored 88.45% on the CyberGym benchmark, a check developed by UC Berkeley researchers that measures how nicely AI methods can reproduce real-world vulnerabilities throughout 1,507 duties drawn from 188 open-source software program initiatives.

Mythos Preview was second at 83.1%, adopted by GPT-5.5 at 81.8%.

The benchmark offers every system an outline of a recognized vulnerability and an unpatched codebase, and measures whether or not it could possibly produce a working assault that triggers the bug.

The scores on the CyberGym leaderboard are self-reported by the businesses, together with Anthropic’s Mythos consequence. The benchmark code is public, however no impartial occasion has verified any of the scores. Additionally, benchmark outcomes don’t essentially mirror real-world efficiency.

The outcomes additionally spotlight rising considerations about AI’s use as an offensive hacking software. The identical capabilities that permit AI to seek out vulnerabilities in pleasant arms can be utilized to find them for exploitation by attackers. Microsoft stated MDASH is getting used internally by its safety engineering groups and might be coming into a restricted personal preview with clients.

Microsoft is telling clients to anticipate larger Patch Tuesdays going ahead as AI accelerates the invention of vulnerabilities.

What's Hot

Dave Hughes Blasts Australian Authorities Over NDIS Cuts and Crime

Miss Universe Australia’s Ex-Husband Faces Chapter Over $556K Theft Spree

Walmart reportedly slicing round 1,000 jobs in company restructuring

Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark – GeekWire

AI IQ is right here: a brand new website scores frontier AI fashions on the human IQ scale. The outcomes are already dividing tech.

Wordle immediately: The reply and hints for Might 14, 2026

Everybody on the Musk v. Altman Trial Is Utilizing Fancy Butt Cushions

iBuyPower slashes as much as $350 off gaming PCs powered by a 16GB RTX 9070 XT, Ryzen 7 7800X3D and 32GB RAM in its Memorial Day sale

Dave Hughes Blasts Australian Authorities Over NDIS Cuts and Crime

Miss Universe Australia’s Ex-Husband Faces Chapter Over $556K Theft Spree

Walmart reportedly slicing round 1,000 jobs in company restructuring

Crews make progress on Grand hearth in Chino Hills; evacuations lifted

Latest Posts

Dave Hughes Blasts Australian Authorities Over NDIS Cuts and Crime

Miss Universe Australia’s Ex-Husband Faces Chapter Over $556K Theft Spree

Walmart reportedly slicing round 1,000 jobs in company restructuring

What's Hot

Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark – GeekWire

Related Posts