Spanish AI firm Multiverse Computing has launched HyperNova 60B 2602, a compressed model of OpenAI’s gpt-oss-120B, and revealed it at no cost on Hugging Face.
The brand new model cuts the unique mannequin’s reminiscence wants from 61GB to 32GB, and Multiverse says it retains near-parity tool-calling efficiency regardless of the 50% discount in measurement.
In concept, this implies a mannequin that when required heavy infrastructure can run on far much less {hardware}. For builders with tighter budgets or vitality constraints, that’s a doubtlessly enormous benefit.
CompactifAI know-how
Multiverse claims features in agent-focused benchmarks in comparison with its earlier compressed launch. It says HyperNova 60B 2602 delivers a 5x enchancment on Tau2-Bench and 2x on Terminal Bench Onerous.
These checks measure device use and coding workflows reasonably than easy textual content replies.
The corporate’s CompactifAI know-how restructures transformer weight matrices utilizing quantum-inspired tensor networks.
Multiverse believes that efficient compression provides an alternative choice to merely constructing bigger and bigger fashions, and hyperlinks that view to ongoing European discussions round sovereign AI, infrastructure limits, and vitality use – so to seek out out extra, I spoke to the corporate about its compression know-how.
- How are you going to compress an LLM?
Multiverse Computing compresses massive language fashions utilizing its proprietary CompactifAI know-how, based mostly on quantum-inspired tensor networks.
As a substitute of merely eradicating parameters, CompactifAI restructures the interior weight matrices of transformer fashions into extremely environment friendly tensor community representations. This mathematical reformulation captures correlations between parameters and eliminates structural redundancy.
The method is utilized post-training, which means the unique mannequin doesn’t have to be retrained and no entry to the unique coaching information is required.
Utilizing this method, CompactifAI can cut back reminiscence utilization by as much as roughly 93% and considerably reduce parameter counts, whereas sustaining sturdy efficiency throughout duties.
The ensuing compressed fashions are smaller, sooner, extra energy-efficient, and simpler to deploy throughout cloud, on-premise, and edge environments.
- Are you able to apply it to each LLM?
It really works on transformer-based massive language fashions, together with dense basis fashions, offered entry to the mannequin weights is offered.
The know-how is architecture-agnostic inside the transformer household and doesn’t require modifications to the mannequin’s exterior habits or APIs.
Compression effectiveness depends upon the extent of redundancy within the mannequin. Massive, overparameterized fashions usually provide the best compression potential.
The first technical problem is preserving mannequin accuracy whereas reaching excessive compression ratios. That is addressed by rigorously controlling tensor decomposition parameters to stability measurement discount and efficiency stability.
One other problem is making certain that compressed fashions preserve robustness throughout totally different duties, together with reasoning, multilingual efficiency, and domain-specific use instances.
Lastly, deployment environments range extensively. Compression have to be optimized for various {hardware} targets, latency necessities, and operational constraints.
- What may a superb analogy be?
Rewriting the blueprint, not eradicating bricks: CompactifAI doesn’t merely take away components of a mannequin. As a substitute, it rewrites the mathematical blueprint so the identical construction is represented extra effectively.
It’s like redesigning a constructing’s inside framework so it makes use of far much less materials whereas preserving energy and performance.
One other analogy is reorganizing an enormous archive right into a extremely structured system that eliminates duplication. The data stays intact, however it’s encoded way more effectively.
- How do you identify accuracy loss?
Accuracy loss is decided by benchmarking the compressed mannequin in opposition to the unique on the identical duties and scoring metrics, then measuring the change.
In observe, that features tool-calling evaluations. Decreasing a loss in functionality right here allows extra superior agentic workflows and coding functions.
- What different corporations (maybe rivals) are engaged on the identical method
Multiverse Computing’s compression method is completely distinctive, based mostly on analysis into quantum-inspired tensor networks by cofounder and CEO Roman Orus.
Although there are different methods accessible for compressing AI fashions, they arrive with the trade-off of a a lot greater diploma of accuracy loss.
- Given the truth that LLMs organically evolve over time, what might be the way forward for your compression ({hardware} implementation perhaps?) or one thing else?
This compression method may be utilized to approaching LLMs as nicely, which means that sooner or later, gadgets comparable to vehicles, telephones, laptops and many others. will have the ability to run small or nano AI fashions preinstalled on their {hardware}.
- Is it {hardware} agnostic? Does it work higher with some {hardware} (ASIC) than others?
Sure, it’s hardware-agnostic on the mannequin degree: CompactifAI compresses the mannequin weights post-training, so the ensuing mannequin may be deployed throughout cloud, on-prem, and edge with out altering the mannequin’s exterior interface.
Inference speedups depend upon what was limiting you earlier than: For those who had been memory-bound, a smaller mannequin usually runs considerably sooner and cheaper on the identical {hardware}.
It doesn’t require an ASIC, however GPUs/AI accelerators will usually ship the very best throughput for transformer inference as soon as the mannequin matches comfortably in reminiscence.
- What does the compression depend on?
CompactifAI depends on redundancy in educated transformer weight matrices: massive fashions are sometimes overparameterised, so the identical behaviors may be represented with fewer efficient parameters.
As a substitute of generic “zip-style” compression, it makes use of a model-aware factorization (quantum-inspired tensor networks) to rewrite massive matrices right into a structured, smaller kind whereas mitigating the accuracy trade-off.
- What prevents others from copying your methods/course of? Analogous to the varied compression methods which can be accessible (e.g. zip, rar, 7z and many others)
Multiverse Computing’s proprietary CompactifAI know-how is a novel method to AI mannequin compression, based mostly on analysis into quantum-inspired tensor networks by cofounder and CEO Roman Orus, and the corporate’s personal analysis crew.
What prevents copycat methods is the technical know-how required to attain such excessive charges of compression with out sacrificing accuracy.
CompactifAI can cut back mannequin measurement by as much as 95% with solely a 2-3% accuracy loss, in comparison with the business customary of 20-30% accuracy loss after simply 50-60% compression.

Comply with TechRadar on Google Information and add us as a most well-liked supply to get our skilled information, evaluations, and opinion in your feeds. Be sure that to click on the Comply with button!
And naturally you too can observe TechRadar on TikTok for information, evaluations, unboxings in video kind, and get common updates from us on WhatsApp too.

