Mythos has been MDASH’d.
A brand new AI-powered system from Microsoft surpassed a headline-grabbing rival from Anthropic on a number one cybersecurity benchmark, utilizing greater than 100 specialised AI brokers working collectively throughout a number of AI fashions to seek out real-world software program vulnerabilities.
Microsoft’s system, codenamed MDASH, was launched this week alongside the disclosure of 16 new vulnerabilities it discovered in numerous variations of Home windows, together with 4 “important” distant code execution flaws fastened on this month’s Patch Tuesday launch.
The corporate, which has confronted persistent criticism over safety lapses, is betting that a number of fashions can uncover vulnerabilities at a tempo that particular person fashions can’t match.
MDASH, derived from the time period “multi-model agentic scanning harness,” works by working specialised AI brokers by means of a staged pipeline. Totally different brokers scan code for potential vulnerabilities, then a separate set of brokers debate whether or not every discovering is actual and exploitable, and a ultimate stage constructs proof-of-concept assaults to substantiate the bugs exist.
By comparability, Anthropic’s Mythos, which raised considerations over its capacity to seek out and exploit software program vulnerabilities when it was previewed earlier this 12 months, is a single AI mannequin working inside an agent framework. Anthropic restricted its launch to a handful of firms by means of a consortium known as Challenge Glasswing, which incorporates Microsoft.
OpenAI’s GPT-5.5 and others on the leaderboard are additionally single-model methods.
MDASH scored 88.45% on the CyberGym benchmark, a check developed by UC Berkeley researchers that measures how nicely AI methods can reproduce real-world vulnerabilities throughout 1,507 duties drawn from 188 open-source software program initiatives.
Mythos Preview was second at 83.1%, adopted by GPT-5.5 at 81.8%.
The benchmark offers every system an outline of a recognized vulnerability and an unpatched codebase, and measures whether or not it could possibly produce a working assault that triggers the bug.
The scores on the CyberGym leaderboard are self-reported by the businesses, together with Anthropic’s Mythos consequence. The benchmark code is public, however no impartial occasion has verified any of the scores. Additionally, benchmark outcomes don’t essentially mirror real-world efficiency.
The outcomes additionally spotlight rising considerations about AI’s use as an offensive hacking software. The identical capabilities that permit AI to seek out vulnerabilities in pleasant arms can be utilized to find them for exploitation by attackers. Microsoft stated MDASH is getting used internally by its safety engineering groups and might be coming into a restricted personal preview with clients.
Microsoft is telling clients to anticipate larger Patch Tuesdays going ahead as AI accelerates the invention of vulnerabilities.

