Go back

"When Size Doesn't Matter": The Revolution of the HRM Model

by Dario Ferrero (VerbaniaNotizie.it)

The biggest revolution in artificial intelligence in recent years comes not from the labs of OpenAI or Google, but from a small startup in Singapore called Sapient Intelligence.

The protagonist of this story is called the Hierarchical Reasoning Model (HRM), an AI agent that is shaking the foundations of the entire sector with a seemingly impossible promise: to reason better than the giants of AI using a fraction of their resources.

This is not the usual language model enlarged to an incredible size, nor another variation on the theme of transformers. HRM is built differently, directly inspired by the functioning of the human brain, and the results it is achieving are nothing short of astounding. This model, with just 27 million parameters, less than a quarter of the first GPT, is systematically outperforming models four times larger in complex reasoning tasks. As if that weren't enough, it is trained with only a thousand examples per problem, while its opponents require mountains of data and months of processing on the most powerful servers in the world.

But the real magic of HRM does not lie in its small size or its training efficiency. Its innovation lies in the fact that it does not just process information like everyone else: it really reasons, emulating human cognitive processes in ways that seemed like science fiction until a few months ago. And the results speak for themselves: where other models fail completely, HRM excels with a naturalness that is more reminiscent of a thinking brain than a calculating machine.

When the Chain of Thought Breaks

To understand the importance of the revolution brought about by HRM, we must first understand how current artificial intelligence models work and why their limits are becoming increasingly evident. ChatGPT, Claude, Gemini and all their older brothers are based on a technique called "Chain of Thought", an approach that sounds promising but hides deep structural fragility.

Imagine having to solve a complex math problem by writing each step with an indelible pen, without ever being able to go back to check or correct what you have written. This is exactly how current models work: they guide themselves step by step through a problem, almost "talking to themselves" out loud, but if they make even one small mistake in this chain, the entire answer can collapse like a house of cards.

As the researchers at Sapient Intelligence explain in their scientific paper, "the chain of thought for reasoning is a crutch, not a satisfactory solution. It relies on fragile, human-defined decompositions where a single misstep or a jumble of steps can completely derail the reasoning process."

The problem is even deeper than it seems. Models based on transformers, the architecture that dominates modern AI, always perform the same amount of "thought" regardless of the difficulty of the question. It is as if a detective had to devote exactly the same time and resources to solving both a bicycle theft and an intricate murder case. They cannot say "This is difficult, I need more time to think" and they cannot review their reasoning once they have started generating the answer.

This rigidity has enormous practical consequences. Current models are forced to translate every reasoning process into explicit language, producing long, slow, and often redundant answers. Worse still, this dependence on language makes them vulnerable to cascading errors: if they get an intermediate step wrong, everything that follows is compromised, regardless of how correct their basic reasoning skills might be.

The Architecture that Imitates the Brain

HRM completely abandons this paradigm, embracing a radically different approach that its creators describe as "brain-inspired". This is not a superficial metaphor or marketing: HRM's architecture directly borrows the layered decision-making strategy of the human brain, applying it to artificial intelligence with results that are redefining what is possible in the field of machine learning.

At the heart of HRM are two components that work in tandem like a perfectly coordinated duo. The first is a high-level planner, which we could imagine as the "slow strategic brain" that observes the big picture, identifies the type of problem to be solved, and draws a general map of the approach to be followed. The second is a low-level executor, the "fast processor" that takes orders from the planner and executes them with precision and speed.

The most fitting analogy is that of a chess master collaborating with an incredibly efficient assistant. The master studies the board, plans the overall strategy, and decides which move to make, while the assistant physically executes the move with millimeter precision. But here the similarity becomes even more interesting: the two do not limit themselves to a single exchange of information, but maintain a continuous dialogue for the entire duration of the problem.

This is the heart of HRM's innovation: the hierarchical reasoning loop. The high-level module develops a strategic plan and passes it to the low-level module, which executes it and returns the results. At this point, the high-level module analyzes what happened, updates its strategy based on the new data, and provides the low-level module with a new refined subproblem to work on. This "back and forth" continues in iterative cycles until the model converges on the optimal solution.

The beauty of this approach is that it allows HRM to internally control and refine its own reasoning while it is still processing the problem, a capability that the vast majority of other models simply do not possess. It is as if, while solving that math problem with the indelible pen, someone suddenly allowed you to erase, rewrite, and rethink each step until you are completely sure of the solution.

But there is more. The most advanced version of HRM uses reinforcement learning to autonomously decide how many iterations are necessary for each type of task, making it even more similar to flexible human thought. Just as we devote more time and mental energy to complex problems than to simple ones, HRM learns to modulate its reasoning cycles based on the intrinsic difficulty of the problem it is facing. Image from Sapient.inc HRM

David vs. Goliath: The Numbers that Shock

The results obtained by HRM on the most difficult reasoning benchmarks are the kind of numbers that make even the most skeptical experts in the field raise their eyebrows. We are talking about a model with just 27 million parameters that not only competes with giants with billions of parameters, but systematically surpasses them in tasks that require deep and abstract reasoning.

On the ARC-AGI benchmark, considered one of the most reliable tests for measuring the abstract and generalization reasoning capabilities of artificial intelligence, HRM achieved a score of 40.3%, surpassing much larger models such as OpenAI's o3-mini-high (34.5%) and Claude 3.7 Sonnet (21.2%). These are not small, statistically insignificant differences: we are talking about substantial performance gaps that, in the world of AI, are equivalent to generational leaps.

But it is on the most extreme reasoning tasks that HRM truly demonstrates its architectural superiority. In extreme-level Sudoku tests and complex mazes, the differences become abysmal. HRM solved 55% of the most difficult Sudokus, while models based on the chain of thought scored a resounding 0%. The same result for 30x30 grid mazes: HRM found the optimal path in 74.5% of cases, while its competitors remained at the starting post with 0%.

It is the AI version of Yoda's adage: 'Size matters not. Look at me. Judge me by my size, do you?' Only in this case, the Force is the hierarchical architecture and Luke Skywalker are the models with billions of parameters that keep crashing in the swamp.

These are not just numbers on a table: they represent the difference between an artificial intelligence that can tackle complex real-world problems and one that gets stuck in the face of challenges that require more than superficial reasoning. It is the difference between an assistant who can help you navigate complex decisions and one who can at best help you write more eloquent emails.

But perhaps the most impressive fact of all concerns training efficiency. While traditional language models require huge datasets extracted from all over the internet and months of processing on the most powerful supercomputers in the world, HRM is trained with only a thousand examples per task. As Guan Wang, one of the founders of Sapient Intelligence, stated, "you could train it on professional-level Sudoku in two hours of GPU" – an efficiency that he literally defines as "ridiculous" in the best sense of the word. Image from Sapient.inc HRM

Beyond Benchmarks: A Structural Revolution

The impressive results on standardized tests are just the tip of the iceberg. The real revolution brought by HRM lies in its ability to solve fundamental structural problems that plague the entire current generation of transformer-based models, problems that until recently seemed an inevitable part of the artificial intelligence landscape.

The first and most significant of these problems is memory efficiency. Traditional transformers are notoriously resource-hungry, requiring huge amounts of memory to operate and even more to be trained. HRM, on the other hand, uses more local gradient updates, which are easier to calculate and "much more biologically plausible," avoiding the famous "deep backpropagation through time" which is memory-intensive and computationally slow.

This memory efficiency is not a simple incremental improvement: it is a paradigm shift that opens up completely new scenarios. Less memory means being able to run more models simultaneously on the same hardware, train faster with fewer resources, and above all, bring advanced artificial intelligence to devices that until yesterday were unthinkable. We are talking about common laptops, edge devices, robots, and even cars – all places where AI could operate autonomously without depending on constant internet connections or remote servers.

The Sapient company is already testing HRM in real-world applications that demonstrate this versatility. In the healthcare sector, the model is used to help diagnose rare diseases, those complex pathologies that require exactly the kind of deep and nuanced reasoning in which HRM excels. In seasonal climate forecasts, it has achieved accuracy rates of 97%, a result that in the world of meteorology is almost science fiction.

But perhaps the most encouraging aspect of HRM is the team behind it. These are not unknown researchers working in garages: the group includes former engineers from DeepMind, Anthropic, DeepSeek, and even Elon Musk's XAI group. These are people who have worked at the forefront of artificial intelligence for years and who are now betting everything on HRM's brain-inspired design. When professionals of this caliber abandon the certainties of the big tech giants to pursue an alternative vision, it is worth paying attention.

Guan Wang, the CEO and founder of Sapient Intelligence, does not mince words when talking about the future of artificial intelligence. His vision is that AGI, artificial general intelligence, is about giving machines human-level intelligence and beyond. And according to Wang, the chain of thought is just a "shortcut," while what they have built "can think" in the true sense of the word.

Open Source and Transparency: A Gift to the Community

In an era where large AI labs tend to keep increasingly strict trade secrets about their most advanced models, Sapient Intelligence's decision to make HRM completely open source is an almost revolutionary act of transparency. The entire project is available on GitHub, allowing anyone in the world to verify it, train their own version, modify it, or build on it. This level of openness is rare for such a promising and strategically important innovation.

Of course, HRM still has limitations that its creators openly acknowledge. For now, the model has a narrower focus than large generalist language models: it is built to reason, not to chat friendly or write romantic poetry. But it is precisely this specialization that makes it so powerful in its domain. It is one of the strongest proofs of concept the industry has ever seen to demonstrate that the future of AI may not lie in ever larger and more generalist models, but in smarter and more specialized architectures.

HRM is not the only experiment of this type underway. The AI research landscape is experiencing a moment of creative effervescence, with teams around the world exploring alternative architectures to the dominant transformers. There is Sakana with their continuous thought machines, the 1-bit LLM models that promise extreme efficiency, and Google's diffusion-based reasoning models. But there is a crucial difference: HRM "is already working" and outperforming much larger models with a fraction of the training data and without the need for massive pre-training.

This suggests that we are witnessing a fundamental paradigm shift. The next big leap in artificial intelligence will probably not be another "scaled GPT clone" to even more mammoth dimensions, but something similar to HRM: a new architecture that brings better reasoning, faster training, and cheaper implementation, all without the need for data centers full of GPUs that consume the electricity of entire cities.

The Future that Really Thinks

Looking ahead, the vision that emerges from HRM's work is that of a future in which artificial intelligence will no longer be confined to the data centers of large technology corporations, but will become a pervasive and accessible presence in our daily lives. Imagine AI agents living in our laptops, home robots, cars, even wearable devices, all capable of sophisticated reasoning without depending on constant internet connections or expensive remote servers.

This democratization of advanced artificial intelligence could have profound implications for how we work, learn, and solve problems. A doctor in a rural clinic could have access to the same advanced diagnostic tools as a metropolitan hospital. An engineer working on a remote construction site could obtain complex structural analyses in real time. A researcher in a laboratory with a limited budget could explore complex scientific hypotheses without having to compete for access to supercomputers.

But perhaps the most fascinating aspect of all is the idea that these AI agents will no longer just "talk" with the internet or regurgitate information processed elsewhere. They will begin to "really think," in the deepest sense of the term, developing original solutions, formulating creative hypotheses, and perhaps even developing insights that we humans would never have considered.

Like any technological revolution, this transformation will bring with it new challenges and ethical issues that we will have to face. But if HRM and similar architectures keep their promises, we could be on the threshold of an era in which artificial intelligence finally becomes what the name promises: not just a sophisticated information processing system, but a true intellectual partner capable of autonomous and creative reasoning.

As Tony Stark would say, sometimes the best solution is not to build a bigger suit of armor, but to build a smarter one. And HRM may have found a way to replace computational brute force with something much more elegant and efficient.

The road is still long and full of unknowns, but one thing is certain: the small 27-million-parameter model created in a Singapore startup has already shown that in the world of artificial intelligence, as often happens in science, quality can really beat quantity. And perhaps, just like in the best David and Goliath stories, it is the smallest one that shows us the way to the future.