The AI updates aren't slowing down. Actually two days after OpenAI launched a brand new underlying AI mannequin for ChatGPT known as GPT-5.3 Instantaneous, the corporate has unveiled one other, much more large improve: GPT-5.4.
Really, GPT-5.4 is available in two varieties: GPT-5.4 Considering and GPT-5.4 Professional, the latter designed for probably the most advanced duties.
Each will probably be obtainable in OpenAI's paid utility programming interface (API) and Codex software program improvement utility, whereas GPT-5.4 Considering will probably be obtainable to all paid subscribers of ChatGPT (Plus, the $20-per-month plan, and up) and Professional will probably be reserved for ChatGPT Professional ($200 month-to-month) and Enterprise plan customers.
ChatGPT Free customers can even get a style of GPT-5.4, however solely when their queries are auto-routed to the mannequin, in line with an OpenAI spokesperson.
The large headlines on this launch are effectivity, with OpenAI reporting that GPT-5.4 makes use of far fewer tokens (47% fewer on some duties) than its predecessors, and, arguably much more impressively, a brand new "native" Pc Use mode obtainable by the API and its Codex that lets GPT-5.4 navigate a customers' laptop like a human and work throughout functions.
The corporate can also be releasing a new suite of ChatGPT integrations permitting GPT-5.4 to be plugged instantly into customers' Microsoft Excel and Google Sheets spreadsheets and cells, enabling granular evaluation and automatic process completion that ought to velocity up work throughout the enterprise, however might make fears of white collar layoffs much more pronounced on the heels of comparable choices from Anthropic's Claude and its new Cowork utility.
OpenAI says GPT-5.4 helps as much as 1 million tokens of context within the API and Codex, enabling brokers to plan, execute, and confirm duties throughout lengthy horizons— nevertheless, it costs double the associated fee per 1 million tokens as soon as the enter exceeds 272,000 tokens.
Native laptop use: a step towards autonomous workflows
Probably the most consequential functionality OpenAI highlights is that GPT-5.4 is its first general-purpose mannequin launched with native, state-of-the-art computer-use capabilities in Codex and the API, enabling brokers to function computer systems and perform multi-step workflows throughout functions.
OpenAI says the mannequin can each write code to function computer systems through libraries like Playwright and problem mouse and keyboard instructions in response to screenshots. OpenAI additionally claims a soar in agentic internet searching.
Benchmark outcomes are introduced as proof that this isn’t merely a UI wrapper.
On BrowseComp, which measures how effectively AI brokers can persistently browse the net to seek out hard-to-locate info, OpenAI experiences GPT-5.4 bettering by 17% absolute over GPT-5.2, and GPT-5.4 Professional reaching 89.3%, described as a brand new state-of-the-art.
On OSWorld-Verified, which measures desktop navigation utilizing screenshots plus keyboard and mouse actions, OpenAI experiences GPT-5.4 at 75.0% success, in comparison with 47.3% for GPT-5.2, and notes reported human efficiency at 72.4%.
On WebArena-Verified, GPT-5.4 reaches 67.3% success utilizing each DOM- and screenshot-driven interplay, in comparison with 65.4% for GPT-5.2. On On-line-Mind2Web, OpenAI experiences 92.8% success utilizing screenshot-based observations alone.
OpenAI additionally hyperlinks laptop use to enhancements in imaginative and prescient and doc dealing with. On MMMU-Professional, GPT-5.4 reaches 81.2% success with out instrument use, in contrast with 79.5% for GPT-5.2, and OpenAI says it achieves that outcome utilizing a fraction of the “pondering tokens.”
On OmniDocBench, GPT-5.4’s common error is reported at 0.109, improved from 0.140 for GPT-5.2. The submit additionally describes expanded help for high-fidelity picture inputs, together with an “unique” element stage as much as 10.24M pixels.
OpenAI positions GPT-5.4 as constructed for longer, multi-step workflows—work that more and more seems to be like an agent preserving state throughout many actions moderately than a chatbot responding as soon as.
Software search and improved instrument orchestration
As instrument ecosystems get bigger, OpenAI argues that the naive method—dumping each instrument definition into the immediate—creates a tax paid on each request: value, latency, and context air pollution.
GPT-5.4 introduces instrument search within the API as a structural repair. As a substitute of receiving all instrument definitions upfront, the mannequin receives a light-weight listing of instruments plus a search functionality, and it retrieves full instrument definitions solely once they’re really wanted.
OpenAI describes the effectivity win with a concrete comparability: on 250 duties from Scale’s MCP Atlas benchmark, operating with 36 MCP servers enabled, the tool-search configuration diminished whole token utilization by 47% whereas reaching the identical accuracy as a configuration that uncovered all MCP features instantly in context.
That 47% determine is particularly in regards to the tool-search setup in that analysis—not a blanket declare that GPT-5.4 makes use of 47% fewer tokens for each form of process.
Enhancements for builders and coding workflows
OpenAI’s coding pitch is that GPT-5.4 combines the coding strengths of GPT-5.3-Codex with stronger instrument and computer-use capabilities that matter when duties aren’t single-shot.
GPT-5.4 matches or outperforms GPT-5.3-Codex on SWE-Bench Professional whereas being decrease latency throughout reasoning efforts.
Codex additionally will get workflow-level knobs. OpenAI says /quick mode delivers as much as 1.5× quicker efficiency throughout supported fashions, together with GPT-5.4, describing it as the identical mannequin and intelligence “simply quicker.”
And it describes releasing an experimental Codex talent, “Playwright (Interactive)”, meant to exhibit how coding and laptop use can work in tandem—visually debugging internet and Electron apps and testing an app because it’s being constructed.
OpenAI for Microsoft Excel and Google Sheets
Alongside GPT-5.4, OpenAI is saying a set of safe AI merchandise in ChatGPT constructed for enterprises and monetary establishments, powered by GPT-5.4 for superior monetary reasoning and Excel-based modeling.
The centerpiece is ChatGPT for Excel and Google Sheets (beta), which OpenAI describes as ChatGPT embedded instantly in spreadsheets to construct, analyze, and replace advanced monetary fashions utilizing the formulation and buildings groups already depend on.
The suite additionally consists of new ChatGPT app integrations supposed to unify market, firm, and inner knowledge right into a single workflow, naming FactSet, MSCI, Third Bridge, and Moody’s.
And it introduces reusable “Expertise” for recurring finance work akin to earnings previews, comparables evaluation, DCF evaluation, and funding memo drafting.
OpenAI anchors the finance push with an inner benchmark declare: mannequin efficiency elevated from 43.7% with GPT-5 to 88.0% with GPT-5.4 Considering on an OpenAI inner funding banking benchmark.
Measuring AI efficiency in opposition to skilled work
OpenAI leans on benchmarks supposed to resemble actual workplace deliverables, not simply puzzle-solving. On GDPval, an analysis spanning “well-specified information work” throughout 44 occupations, OpenAI experiences that GPT-5.4 matches or exceeds business professionals in 83.0% of comparisons, in comparison with 71.0% for GPT-5.2.
The corporate additionally highlights particular enhancements within the sorts of artifacts that have a tendency to reveal mannequin weaknesses: structured tables, formulation, narrative coherence, and design high quality.
In an inner benchmark of spreadsheet modeling duties modeled after what a junior funding banking analyst would possibly do, GPT-5.4 reaches a imply rating of 87.5%, in comparison with 68.4% for GPT-5.2.
And on a set of presentation analysis prompts, OpenAI says human raters most well-liked GPT-5.4’s displays 68.0% of the time over GPT-5.2’s, citing stronger aesthetics, larger visible selection, and simpler use of picture technology.
Enhancing reliability and lowering hallucinations
OpenAI describes GPT-5.4 as its most factual mannequin but and connects that declare to a sensible dataset: de-identified prompts the place customers beforehand flagged factual errors. On that set, OpenAI experiences GPT-5.4’s particular person claims are 33% much less prone to be false and its full responses are 18% much less prone to include any errors in comparison with GPT-5.2.
In statements offered to VentureBeat from OpenAI and attributed early GPT-5.4 testers, Daniel Swiecki of Walleye Capital says that on inner finance and Excel evaluations, GPT-5.4 improved accuracy by 30 share factors, which he hyperlinks to expanded automation for mannequin updates and state of affairs evaluation.
Brendan Foody, CEO of Mercor, calls GPT-5.4 one of the best mannequin the corporate has tried and says it’s now high of Mercor’s APEX-Brokers benchmark for skilled providers work, emphasizing long-horizon deliverables like slide decks, monetary fashions, and authorized evaluation.
Pricing and availability
Within the API, OpenAI says GPT-5.4 Considering is out there as gpt-5.4 and GPT-5.4 Professional as gpt-5.4-pro. Pricing is as follows:
GPT-5.4: $2.50 / 1M enter tokens; $15 / 1M output tokens
GPT-5.4 Professional: $30 / 1M enter tokens; $180 / 1M output tokens
Batch + Flex: half-rate; Precedence processing: 2× fee
This makes GPT-5.4 among the many dearer fashions to run over API in comparison with all the subject, as seen within the desk under.
Mannequin | Enter | Output | Complete Value | Supply |
Qwen 3 Turbo | $0.05 | $0.20 | $0.25 | |
Qwen3.5-Flash | $0.10 | $0.40 | $0.50 | |
deepseek-chat (V3.2-Exp) | $0.28 | $0.42 | $0.70 | |
deepseek-reasoner (V3.2-Exp) | $0.28 | $0.42 | $0.70 | |
Grok 4.1 Quick (reasoning) | $0.20 | $0.50 | $0.70 | |
Grok 4.1 Quick (non-reasoning) | $0.20 | $0.50 | $0.70 | |
MiniMax M2.5 | $0.15 | $1.20 | $1.35 | |
Gemini 3.1 Flash-Lite | $0.25 | $1.50 | $1.75 | |
MiniMax M2.5-Lightning | $0.30 | $2.40 | $2.70 | |
Gemini 3 Flash Preview | $0.50 | $3.00 | $3.50 | |
Kimi-k2.5 | $0.60 | $3.00 | $3.60 | |
GLM-5 | $1.00 | $3.20 | $4.20 | |
ERNIE 5.0 | $0.85 | $3.40 | $4.25 | |
Claude Haiku 4.5 | $1.00 | $5.00 | $6.00 | |
Qwen3-Max (2026-01-23) | $1.20 | $6.00 | $7.20 | |
Gemini 3 Professional (≤200K) | $2.00 | $12.00 | $14.00 | |
GPT-5.2 | $1.75 | $14.00 | $15.75 | |
Claude Sonnet 4.6 | $3.00 | $15.00 | $18.00 | |
GPT-5.4 | $2.50 | $15.00 | $17.50 | |
Gemini 3 Professional (>200K) | $4.00 | $18.00 | $22.00 | |
Claude Opus 4.6 | $5.00 | $25.00 | $30.00 | |
GPT-5.2 Professional | $21.00 | $168.00 | $189.00 | |
GPT-5.4 Professional | $30.00 | $180.00 | $210.00 |
One other vital observe: with GPT-5.4, requests that exceed 272,000 enter tokens are billed at 2X the conventional fee, reflecting the power to ship prompts bigger than earlier fashions supported.
In Codex, compaction defaults to 272k tokens, and the upper long-context pricing applies solely when the enter exceeds 272k—that means builders can preserve sending prompts at or beneath that measurement with out triggering the upper fee, however can choose into bigger prompts by elevating the compaction restrict, with solely these bigger requests billed in a different way.
An OpenAI spokesperson stated that within the API the utmost output is 128,000 tokens, the identical as earlier fashions.
Lastly, on why GPT-5.4 is priced greater at baseline, the spokesperson attributed it to a few components: greater functionality on advanced duties (together with coding, laptop use, deep analysis, superior doc technology, and gear use), main analysis enhancements from OpenAI’s roadmap, and extra environment friendly reasoning that makes use of fewer reasoning tokens for comparable duties—including that OpenAI believes GPT-5.4 stays under comparable frontier fashions on pricing even with the rise.
The broader shift
Throughout the discharge and the follow-up clarifications, GPT-5.4 is positioned as a mannequin meant to maneuver past “reply technology” and into sustained skilled workflows—ones that require instrument orchestration, laptop interplay, lengthy context, and outputs that appear like the artifacts individuals really use at work.
OpenAI’s emphasis on token effectivity, instrument search, native laptop use, and diminished user-flagged factual errors all level in the identical course: making agentic methods extra viable in manufacturing by reducing the price of retries—whether or not that retry is a human re-prompting, an agent calling one other instrument, or a workflow re-running as a result of the primary move didn’t stick.

