position: EnglishChannel  > Insight> ​From Puppets to Partners: The Next Step for Humanoids

​From Puppets to Partners: The Next Step for Humanoids

Source: Science and Technology Daily | 2025-12-02 12:28:35 | Author: LI Gangyang, WANG Xinlong & LUO Shaqi

The last few years have seen a dazzling spectacle of robotic achievements. We have seen humanoid robots execute flawless backflips and perform intricate dance routines. These demonstrations, by established industry pioneers as well as ambitious new entrants, are remarkable feats of engineering, proving that we can build machines with the kinematic complexity to mimic human movement.

But what will it take to transition them to the real world?

While these demonstrations are impressive, they are often choreographed routines in largely predictable environments. Real-world work, by contrast, demands the ability to contend with a messy, unpredictable, and often forceful physical reality.

Therefore, the next great leap for humanoid robotics is building on that kinematic grace to master the physics of forceful, contact‑rich work, unlocking their potential to serve in industry, healthcare, home services, and disaster relief.

Today's most impressive humanoid demonstrations are triumphs of dynamic locomotion and trajectory tracking. They showcase a robot's ability to maintain balance while following a complex, pre‑defined path of motion. This is a significant achievement that has taken decades of research.

However, consider a scenario where an agile robot is asked not to dance, but to push a stalled car, open a heavy, spring-loaded fire door, or assist a paramedic in lifting a stretcher.

Here, the primary challenge is not just following a path, but managing sustained, high-intensity forces. When confronted with strong and unpredictable resistance, a control strategy based on trajectory tracking often fails: the robot may attempt to rigidly hold its course, causing joint motors to saturate and overheat, or it may lose balance and fall.

It was programmed to move, not to struggle. This gap between elegant demonstration and practical deployment remains a bottleneck for the humanoid robotics industry.

Closing this gap will largely determine whether humanoids remain laboratory showpieces, or grow into reliable partners in real workplaces.

This requires overcoming the challenge of "whole‑body intelligence" in contact‑rich environments, which boils down to two fundamental difficulties.

First, robots lack a human-like reaction mechanism. When a person pushes a heavy cart, they don't just stiffen their arms. They instinctively lean forward, lower their center of gravity, and tense their core, coordinating their entire body to generate and sustain force while maintaining balance. This is a deeply ingrained biomechanical intelligence.

Most robotic control strategies, in contrast, adopt a rigid approach. They treat external forces as disturbances to be rejected, primarily by increasing torque at the joints. This works for small bumps, but against the powerful and continuous forces of real work, it's a losing battle.

Second is the notorious "curse of high‑dimensionality." A modern humanoid robot has about 30 degrees of freedom (DoFs) and its observation space may reach several hundred dimensions. Teaching it to process these high-dimensional inputs and coordinate all DoFs simultaneously to both balance and perform a task is an astronomically complex optimization problem.

For learning‑based methods like reinforcement learning, the sheer size of the state and action spaces makes it incredibly difficult to discover effective strategies. It's like asking a conductor to improvise a symphony by individually instructing every musician in a hundred‑person orchestra in real time. The complexity can be overwhelming, often leading to unstable or suboptimal behaviors.

To overcome these challenges, the robotics industry is pivoting towards a new paradigm. Rather than programming explicit responses for every possible situation, researchers are developing methods that enable robots to learn the fundamental principles of physical interaction from the ground up. This shift moves the field from rigid programming to emergent, intelligent behavior.

A prime example of this trend is the Thor framework developed at the Beijing Academy of Artificial Intelligence (BAAI). When our 35 kg humanoid robot successfully pulled a 1,400 kg vehicle, it was not executing a pre-calculated routine. Instead, it demonstrated a deeper principle: using its entire body as a coordinated system to manage immense force and maintain stability much like a trained athlete.

This new solution is characterized by two key insights.

First, the learning process draws from biomechanics: by rewarding robots for discovering human-like strategies—such as leaning into a task to optimize leverage and balance—they are guided toward more physically intelligent and efficient behaviors.

Second, innovative control architectures decompose the massive challenge of whole-body control into smaller, manageable sub-tasks, enabling sophisticated and robust coordination that would be nearly impossible to engineer by hand.

The strength of this approach lies in its generality. A robot that understands force interaction principles can apply them across a wide range of tasks. The same underlying intelligence that enables it to pull a car also helps it open a heavy fire door, maneuver a loaded cart, or keep steady contact while cleaning a surface. The robot is no longer just a collection of moving parts—it becomes a holistic physical agent, demonstrably stronger, more stable, and vastly more capable in the real world.

The principles behind this learning‑based, whole‑body approach are part of a critical shift across the robotics industry. While companies pursue different paths—some focusing on vision‑based reasoning, some on logistics, others on dynamic agility—a consensus is emerging: Mastering physical interaction is the next milestone.

The industry's focus is evolving from merely controlling the "hand" to intelligently coordinating the entire body as a single, powerful, and reactive system.

In this sense, work on frameworks like BAAI's Thor is not an isolated effort, but part of a broader transition toward "whole‑body, force‑aware" humanoids.

The road ahead is still challenging, but the path is becoming clearer. The next wave of innovation will come from the synergy of whole‑body control with other advancing technologies. This includes integrating advanced tactile and force sensing to give robots a true "touch" of their work, allowing for more delicate and precise manipulation in contact‑rich settings.

Furthermore, we must work toward generalization. How do we scale from learning specific skills, like pulling or pushing, to developing a generalized "physical common sense"?

This is where the world of whole‑body control will inevitably merge with the world of foundation models. By combining robust physical skills with the semantic understanding of vision‑language‑action models, we can create robots that not only know how to pull a door, but also understand why and when they should.

Breakthroughs in whole‑body reaction, as exemplified by the new wave of learning‑based frameworks, represent a critical inflection point. We are finally moving away from simply programming robots to do tasks toward teaching them to understand the physics of the work itself. This is the difference between a puppet and a partner.

The ultimate goal is not just to build smarter machines, but to build a future where humans and robots can work side‑by‑side to solve humanity's grand challenges—from caring for our aging populations to responding to natural disasters and creating more efficient, humane industries.

Achieving this vision will require an open, global, and collaborative ecosystem, where foundational research is shared to accelerate progress for all. We are proud to contribute our research to this collective endeavor toward a future where humanoid robots can finally, and reliably, stand their ground in the real world.

This article is written by LI Gangyang, WANG Xinlong and LUO Shaqi from the Beijing Academy of Artificial Intelligence.


Editor:ZHONG Jianli

Top News

AI-powered Data Management Aids Enterprises' Intelligent Transformation

The new generation of digital technologies, led by AI, is driving the reorganization of global factor resources and the transformation of economic structure, reshaping the world's competitive landscape. Against this backdrop, there is a universal consensus to deepen cross-border cooperation in the digital economy.

A New Era of Low-carbon Offshore Shipbuilding

At a side event on "Green Trade and Just Transition" hosted by the China Environmental Protection Foundation and the China Environmental Federation at the COP30 Blue Zone in Belem, Brazil, Chinese and international environmental groups, companies, research institutions and youth delegates discussed pathways for greener trade and fair industrial transitions on November 18.

抱歉,您使用的浏览器版本过低或开启了浏览器兼容模式,这会影响您正常浏览本网页

您可以进行以下操作:

1.将浏览器切换回极速模式

2.点击下面图标升级或更换您的浏览器

3.暂不升级,继续浏览

继续浏览