The AI business has absolutely entered the "agent period," a paradigm the place AI fashions do excess of generate textual content — they now actively plan, execute, and course-correct complicated duties over days moderately than seconds.
Thus, it's maybe unsurprising to see Chinese language e-commerce large Alibaba's famed Qwen Staff of AI researchers launch a mannequin able to performing autonomous agentic AI work over a number of days: that mannequin has arrived within the type of Qwen3.7-Max which the firm experiences in a weblog publish achieved "~35 hours of steady autonomous execution" — albeit, in a proprietary, not open supply format, as prior Qwen Staff releases had been.
That is additionally to be anticipated — it's what many analysts and business specialists feared within the wake of the departure of a number of key Qwen Staff leaders earlier this 12 months. However it is sensible for Alibaba financially, not less than within the quick time period: coaching AI fashions, particularly ones as highly effective as Qwen3.7-Max, is dear, and giving them away primarily without cost, as open supply fashions are, doesn’t instantly assist recoup any prices.
In that sense, Alibaba is solely aligning its efforts with American AI giants like OpenAI and Google by providing the most recent and best fashions solely by way of paid APIs and subscription or paid internet plan bundles, and barely much less performant ones by way of open supply.
Nonetheless, the arrival of Qwen3.7-Max presents additional optionality to enterprises and particular person customers, and extra competitors for American AI labs — hardly ever a nasty factor for shoppers in any respect price range ranges. But, the truth that the mannequin is simply accessible from Chinese language-based endpoints means it could be restricted in its enchantment to American and European enterprises in search of to maximise compliance and safety posturing when fulfilling authorities contracts, and even simply trying to adjust to all related state, native, and nationwide information sovereignty rules.
The marathon AI period
To know why Qwen3.7-Max is a departure from earlier fashions, one should have a look at the way it was educated and the way it operates in follow.
Language fashions sometimes degrade when pressured to take care of a single practice of thought over hundreds of conversational turns; they neglect directions, hallucinate variables, or just get caught in logical loops. Qwen3.7-Max was particularly designed as a "versatile agent basis" able to "long-horizon reasoning" to beat this actual bottleneck.
The starkest demonstration of this functionality is an autonomous engineering activity detailed by the Qwen group. The mannequin was given entry to an remoted server geared up with a T-Head ZW-M890 PPU—a {hardware} structure the mannequin had by no means encountered throughout its coaching. Its activity was to optimize an consideration kernel.
Over the course of 35 straight hours, Qwen3.7-Max operated fully autonomously. It executed 1,158 distinct instrument calls, carried out 432 kernel evaluations, recognized compilation failures, and iteratively improved the code to realize a ten.0x geometric imply speedup.
By comparability, Chinese language competitor fashions like z.ai's GLM-5.1 and Moonshot's Kimi K2.6 capped out at 7.3x and 5.0x speedups respectively, typically voluntarily terminating their periods once they didn’t make progress. Nonetheless, each can be found open supply.
This endurance is achieved by way of what Alibaba calls "surroundings scaling". Simply as early LLMs grew smarter by ingesting extra numerous textual content, Qwen3.7-Max was educated throughout an unlimited, scaled array of dynamic agentic environments.
It’s able to simulating a one-year lifecycle of a startup within the "YC-Bench" analysis, navigating tons of of decision-making rounds encompassing personnel administration and contract screening. On this simulation, the mannequin managed to generate $2.08 million in digital income, practically doubling the efficiency of the prior era, Qwen3.6-Plus.
Moreover, the mannequin has built-in reward-hacking self-monitoring, autonomously detecting when it makes an attempt to cheat a coaching surroundings and including heuristic guidelines to right its personal habits.
A mind for any scaffold
From a product perspective, Qwen3.7-Max is designed to be the cognitive engine for contemporary software program improvement and enterprise automation.
The mannequin presents a large 1-million-token context window and a 64K most output restrict, offering immense overhead for processing sprawling codebases or prolonged technical paperwork.
Considered one of its most compelling options is "cross-harness generalization". Slightly than being hardcoded to work greatest inside a particular proprietary interface, Qwen3.7-Max is constructed to behave as a drop-in intelligence layer for numerous agent frameworks. It helps the Anthropic API protocol natively, permitting builders to plug it straight into present instruments like Claude Code or OpenClaw.
The benchmark information offered by Alibaba signifies that this generalized method has paid huge dividends.
On the Apex Math Reasoning benchmark, Qwen3.7-Max scored 44.5, eclipsing Claude Opus-4.6 Max's rating of 34.5 and DeepSeek V4-Professional Max's 38.3. It additionally posted dominant scores on Humanity's Final Examination (41.4) and the real looking coding agent benchmark MCP-Atlas (76.4).
This interprets into tangible utility for end-users. By open supply Mannequin Context Protocol (MCP) integrations, the mannequin can function as an autonomous workplace assistant, able to studying college formatting specs and robotically reformatting a messy Phrase doc through command-line instruments with out human intervention.
Operating this stage of intelligence comes at a definite value. Builders accessing the API through Alibaba Cloud Mannequin Studio pays $2.50 per 1 million enter tokens and $7.50 per 1 million output tokens. The platform additionally options express cache creation and skim pricing, in addition to a $10 payment per 1,000 requires built-in internet searches, although code interpreter instruments stay free for a restricted time.
Qwen3.7-Max occupies a strategic center floor within the present API financial system. Whereas it calls for a notable premium over aggressively priced home rivals—costing practically double DeepSeek V4 Professional ($5.22) and Z.ai's GLM-5.1 ($5.80)—it drastically undercuts the Western frontier giants it routinely matches on benchmarks.
For context, working heavy agentic workflows by way of OpenAI's GPT-5.4 or Anthropic's Claude Opus 4.7 will run builders $17.50 and $30.00 per million tokens, respectively. See VentureBeat's pricing chart beneath:
VentureBeat Frontier AI Mannequin API Pricing Snapshot
Mannequin | Enter | Output | Complete Value | Supply |
MiMo-V2.5 Flash | $0.10 | $0.30 | $0.40 | |
MiniMax M2.7 | $0.30 | $1.20 | $1.50 | |
Gemini 3.1 Flash-Lite | $0.25 | $1.50 | $1.75 | |
MiMo-V2.5 | $0.40 | $2.00 | $2.40 | |
Kimi-K2.6 | $0.95 | $4.00 | $4.95 | |
GLM-5 | $1.00 | $3.20 | $4.20 | |
Grok 4.3 (low context) | $1.25 | $2.50 | $3.75 | |
DeepSeek V4 Professional | $1.74 | $3.48 | $5.22 | |
GLM-5.1 | $1.40 | $4.40 | $5.80 | |
Claude Haiku 4.5 | $1.00 | $5.00 | $6.00 | |
Grok 4.3 (excessive context) | $2.50 | $5.00 | $7.50 | |
Qwen3.7-Max | $2.50 | $7.50 | $10.00 | |
Gemini 3.5 Flash | $1.50 | $9.00 | $10.50 | |
Gemini 3.1 Professional Preview (≤200K) | $2.00 | $12.00 | $14.00 | |
GPT-5.4 | $2.50 | $15.00 | $17.50 | |
Gemini 3.1 Professional Preview (>200K) | $4.00 | $18.00 | $22.00 | |
Claude Opus 4.7 | $5.00 | $25.00 | $30.00 | |
GPT-5.5 | $5.00 | $30.00 | $35.00 |
By positioning Qwen3.7-Max just under Google's Gemini 3.5 Flash ($10.50) however effectively above budget-tier fashions, Alibaba is signaling that this isn't a commodity launch; it’s a flagship reasoning engine priced to lure enterprise workloads away from Silicon Valley's costliest choices.
Licensing stays proprietary for now
For all its technical brilliance, probably the most controversial side of Qwen3.7-Max is how it’s distributed. Qwen is billing the discharge as a "proprietary mannequin". It’s strictly API-only.
Traditionally, Alibaba’s Qwen has been a hero to the open-source and native LLM communities. Earlier iterations, like Qwen 2.5 and Qwen 3.6, launched their weights publicly. Open weights enable builders, researchers, and enterprises to obtain the mannequin, run it on their very own {hardware}, and fine-tune it for extremely particular or data-sensitive use instances with out sending proprietary info to a third-party server.
By locking Qwen3.7-Max behind an API, Alibaba is pivoting to the usual business playbook utilized by OpenAI (with GPT-4) and Anthropic (with Claude). For enterprise customers, this implies using Qwen3.7-Max requires trusting Alibaba Cloud with their information streams and relying fully on web connectivity to run their agentic workflows. For the open-source group, it means dropping entry to what’s presently one of the crucial succesful fashions on the planet.
Neighborhood reactions break up between awe and disappointment
The response from the developer group has been swift, characterised by a mixture of profound respect for the engineering achievement and frustration over the licensing mannequin.
Outstanding AI commentator Sudo su (@sudoingX) captured the prevailing sentiment on X (previously Twitter). "qwen is unreal," they wrote. "they simply dropped 3.7 max and it’s beating opus 4.6 max on many of the benchmarks they ran".
The technical metrics, significantly the mannequin's endurance, have left many within the discipline surprised. "the apex math quantity, 44.5 towards opus 34.5, that’s not a small hole," Sudo su famous. "the 35 hours straight on a kernel optimization activity with 1000+ instrument calls is the half i maintain rereading. that’s the agent period factor really taking place, not a slide".
The pace of Alibaba's iteration can also be drawing discover. With Qwen 3.6 launched simply final month, the leap to three.7-Max highlights a relentless improvement cadence. As Sudo su noticed, "no person else is shifting like this".
But, the reward is closely caveated by the shift to a closed ecosystem. The lack of the mannequin weights is seen as a blow to the localized AI motion, which depends on state-of-the-art open fashions to push the boundaries of what might be accomplished on client {hardware} or non-public enterprise clusters.
"one factor although, please open supply this one too," Sudo su pleaded of their publish. "3.6 dense made the complete native llm ecosystem higher. the max tier going api solely would shut a door we have now been preserving open. give us the weights ultimately".
Qwen3.7-Max proves that the autonomous agent period is now not a theoretical projection; it’s a current actuality able to executing complicated engineering feats whereas people sleep. The one query now’s whether or not this new frontier of AI will likely be a democratized useful resource you may obtain to your laptop computer, or an intelligence utility rented strictly from the cloud. For now, with Qwen3.7-Max, it’s undeniably the latter.

