This is a Plain English Papers summary of a research paper called Robot Revolution: Adversarial AI Learns Human-Like Movement. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.

Revolutionizing Humanoid Robot Movement with Adversarial Learning

Human-like movement in robots remains one of robotics' greatest challenges. While humans naturally coordinate their upper and lower bodies for complex movements, humanoid robots struggle to achieve this balance. Current approaches that treat the whole body as a single control system often lead to instability and falls during real-world execution.

Researchers from China Telecom, ShanghaiTech University, and other institutions have developed a breakthrough approach called Adversarial Locomotion and Motion Imitation (ALMI) that fundamentally rethinks humanoid robot control by treating upper and lower body coordination as an adversarial learning problem.

The Challenge of Humanoid Movement

Traditional approaches to humanoid robot control face significant limitations. Whole-body policies are computationally expensive due to the high number of degrees of freedom in humanoid robots. These approaches often prioritize motion tracking over stability, frequently causing robots to fall during real-world deployment.

The key insight behind ALMI is recognizing that human movement involves distinct, specialized roles for the upper and lower body. While our lower body provides stability and locomotion, our upper body handles expressive movements and manipulation tasks. By splitting these functions and training them adversarially, robots can achieve more natural, stable movement.

The ALMI Framework: Divide and Conquer

ALMI uses a dual-policy approach that splits control between the upper and lower body:

  1. Lower Body Policy: Focuses on robust locomotion, learning to maintain stability while following velocity commands, even when the upper body makes potentially destabilizing movements.

  2. Upper Body Policy: Specializes in precise motion imitation, tracking reference movements accurately despite disturbances from the lower body's movements.

What makes ALMI unique is its adversarial training approach. Unlike other separated control systems like HumanMimic, ALMI trains these policies to actively resist disturbances from each other:

  • The lower body learns to stay stable despite challenging upper body movements
  • The upper body learns to track motions precisely despite the lower body's movements
  • Through iterative updates, both policies develop robust coordination

This approach is fundamentally different from traditional hierarchical control systems seen in previous works like Bi-Level Motion Imitation, where policies typically operate in a master-slave relationship rather than adversarially.

Policy Architecture and Training

Each policy in ALMI has its own neural network with carefully designed observation and action spaces. The observation space for each policy includes key information about the robot's current state:

State Term Lower dim. Upper dim. Whole dim.
Base angular velocity 3 3 3
Base gravity 3 3 3
Commands 3 (velocity) 9 (motion) 12 (velocity+motion)
DoF position 21 21 21
DoF velocity 21 21 21
Actions 12 (lower) 9 (upper) 21 (whole)
Periodic phase 2 2 2
Total dim 65 68 83

Table 4: State and action space information in ALMI setting.

The lower body policy receives rewards for:

  • Following velocity commands accurately
  • Maintaining stability
  • Energy efficiency
Term Expression Weight
Penalty
DoF position limits $\mathbb{I}\left(\boldsymbol{q}^{l} \notin\left[\boldsymbol{q}{\text {min }}^{l}, \boldsymbol{q}{\text {max }}^{l}\right]\right)$ $-5.0$
Alive $\mathbb{I}($ robot stays alive $)$ 0.15
Regularization
Linear velocity of Z axis $\left\ \boldsymbol{v}_{z}\right\
Angular velocity of X&Y axis $\left\ \boldsymbol{w}_{\mathrm{xy}}\right\
Orientation $\left\ \boldsymbol{g}_{\mathrm{xy}}^{\text {cont }}\right\
Torque $\left\ \boldsymbol{\tau}^{l}\right\
Base Height $\left\ h-h^{l} \operatorname{arget}\right\
DoF acceleration $\ \ddot{\boldsymbol{q}}^{l}\
DoF velocity $\left\ \dot{\boldsymbol{q}}^{l}\right\
Lower body action rate $\left\ \boldsymbol{a}{i}^{\prime}-\boldsymbol{a}{i-1}^{\prime}\right\
Hip DoF position $\left\ \boldsymbol{q}^{\text {hip roll&cow }}\right\
Slippage $\left\ \boldsymbol{v}_{\mathrm{xy}}^{\text {bot }}\right\
Feet swing height $\left\ \boldsymbol{q}_{\mathrm{z}}^{\text {foot }}-0.08\right\
Feet Contact $\sum_{i=1}^{N_{\text {feet }}} \neg\left(\mathbb{I}\left(\left\ \boldsymbol{F}_{\mathrm{z}, \mathrm{i}}^{\text {foot }}\right\
Feet distance $\exp \left(-100 \times d_{\text {feet }}^{\text {out of range }}\right)$ 0.5
Knee distance $\exp \left(-100 \times d_{\text {knee }}^{\text {out of range }}\right)$ 0.4
Stand still $\left\ \boldsymbol{q}{\mathrm{i}}^{\prime}-\boldsymbol{q}{\mathrm{i}-1}^{\prime}\right\
Ankle torque $\left\ \boldsymbol{\tau}^{\text {ankle }}\right\
Ankle action rate $\left\ \boldsymbol{a}{i}^{\text {ankle }}-\boldsymbol{a}{i-1}^{\text {ankle }}\right\
Stance base velocity $\ \boldsymbol{v}\
Feet contact forces $\min \left(\left\ F^{\text {foot }}-100\right\
Task
Linear velocity $\exp \left(-4\left\ \boldsymbol{c}{\mathrm{xy}}-\boldsymbol{v}{\mathrm{xy}}\right\
Angular velocity $\exp \left(-4\left\ \boldsymbol{c}{\text {yaw }}-\boldsymbol{v}{\text {yaw }}\right\

Table 5: Reward terms and weights for training the lower-body policy in ALMI.

The upper body policy receives rewards for:

  • Accurate motion tracking
  • Smooth movements
  • Energy efficiency
Term Expression Weight
Penalty
DoF position limits $\mathbb{I}\left(\boldsymbol{q}^{\mathrm{u}} \notin\left[\boldsymbol{q}{\text {min }}^{\mathrm{u}}, \boldsymbol{q}{\text {max }}^{\mathrm{u}}\right]\right)$ -5.0
Alive $\mathbb{I}\left(\right.$ robot stays alive) 0.15
Regularization
Orientation $\left\ \boldsymbol{g}_{2 \mathrm{~N}}^{\mathrm{const}}\right\
Torque $\left\ \boldsymbol{\tau}^{\mathrm{u}}\right\
Upper DoF acceleration $\left\ \hat{\boldsymbol{q}}^{\mathrm{u}}\right\
Upper DoF velocity $\left\ \hat{\boldsymbol{q}}^{\mathrm{u}}\right\
Upper body action rate $\left\ \boldsymbol{a}{1}^{\mathrm{u}}-\boldsymbol{a}{(i-1}^{\mathrm{u}}\right\
Task
Upper DoF position $\exp \left(-0.5\left\ \hat{\boldsymbol{q}}^{\mathrm{u}}-\boldsymbol{q}^{\mathrm{u}}\right\

Table 7: Reward terms and weights in ALMI Upper Body Policy setting.

The adversarial training process uses a curriculum learning approach, gradually increasing the difficulty of motions and disturbances:

Term Value
$\boldsymbol{C}_{\text {min }}^{l}$ $[-0.7,-0.5,-0.5]$
$\boldsymbol{C}_{\text {max }}^{l}$ $[0.7,0.5,0.5]$
$d^{n}$ 0.9

Table 6: Upper body curriculum terms and values.

Experimental Results: Superior Performance

The researchers evaluated ALMI against baseline approaches at three difficulty levels:

Level Command & Environment
$\hat{v}_{\mathrm{x}, \mathrm{t}}$ $\hat{v}_{\mathrm{y}, \mathrm{t}}$ $\hat{\omega}_{\mathrm{yaw}, \mathrm{t}}$ terrain level push robot
easy 0.7 0.0 0.2 0 $\boldsymbol{s}$
medium 1.0 0.3 0.4 3 $\checkmark$
hard 1.3 0.6 0.6 6 $\checkmark$

Table 1: Locomotion difficulty level setting.

ALMI significantly outperformed both whole-body policy approaches and other split-control methods across all metrics and difficulty levels:

Metrics
Method $E_{\mathrm{vel}} \downarrow$ $E_{\mathrm{ang}} \downarrow$ $E_{\mathrm{ijne}}^{\mathrm{upper}} \downarrow$ $E_{\mathrm{kpe}}^{\mathrm{upper}} \downarrow$ $E_{\mathrm{action}}^{\mathrm{upper}} \downarrow$ $E_{\mathrm{action}}^{\mathrm{lower}} \downarrow$ $E_{\mathrm{g}} \downarrow$ Survival $\uparrow$
Easy
ALMI $\mathbf{0 . 1 1 3 5}$ $\mathbf{0 . 2 6 4 7}$ $\mathbf{0 . 1 9 3 1}$ $\mathbf{0 . 0 4 6 0}$ $\mathbf{0 . 0 4 6 2}$ $\mathbf{0 . 0 1 7 0}$ $\mathbf{0 . 6 9 1 9}$ $\mathbf{1 . 0 0 0 0}$
ALMI(whole) 0.1386 0.5433 0.5756 0.0704 0.0800 3.0356 0.9675 0.9991
Exbody 0.2383 0.4056 0.3559 0.0995 1.7813 1.8152 0.9693 0.8912
Medium
ALMI $\mathbf{0 . 2 1 9 2}$ $\mathbf{0 . 3 5 2 0}$ $\mathbf{0 . 2 0 0 7}$ $\mathbf{0 . 0 4 5 0}$ $\mathbf{0 . 0 5 9 8}$ $\mathbf{0 . 0 1 7 2}$ $\mathbf{0 . 7 6 0 4}$ $\mathbf{0 . 9 8 5 2}$
ALMI(whole) 0.2380 0.5563 0.6734 0.0637 0.0409 2.9225 1.0750 0.9763
Exbody 0.3063 0.5087 0.3658 0.1233 1.7683 1.8019 1.0166 0.8845
Hard
ALMI $\mathbf{0 . 2 2 0 2}$ $\mathbf{0 . 4 8 1 2}$ $\mathbf{0 . 2 1 1 6}$ $\mathbf{0 . 0 4 5 8}$ $\mathbf{0 . 0 6 0 0}$ $\mathbf{0 . 0 1 7 5}$ $\mathbf{0 . 8 5 5 1}$ $\mathbf{0 . 9 7 2 3}$
ALMI(whole) 0.3178 0.7224 0.7022 0.0635 0.0519 2.9317 1.1656 0.9491
Exbody 0.4838 0.5753 0.3758 0.1269 1.7352 1.7689 1.0243 0.8778

Table 2: Simulated evaluation of ALMI, ALMI (whole body) and Exbody on CMU dataset.

Key observations:

  • Lower velocity error ($E_{vel}$) indicates better tracking of commanded velocities
  • Lower joint position errors ($E_{jpe}^{upper}$) shows more accurate motion tracking
  • Higher survival rates demonstrate better stability

Ablation studies confirmed the importance of the adversarial training technique:

Metrics
Method $E_{\text {vol }} \downarrow$ $E_{\text {ang }} \downarrow$ $E_{\text {jpe }}^{\text {upper }} \downarrow$ $E_{\text {kpe }}^{\text {upper }} \downarrow$ $E_{\text {action }}^{\text {upper }} \downarrow$ $E_{\text {action }}^{\text {lower }} \downarrow$ $E_{\mathrm{g}} \downarrow$ Survice $\uparrow$
Easy
lower-3 + upper-2 $\mathbf{0 . 1 1 3 5}$ $\mathbf{0 . 2 6 4 7}$ 0.1931 $\mathbf{0 . 0 4 6 0}$ $\mathbf{0 . 0 4 6 2}$ $\mathbf{0 . 0 1 7 0}$ $\mathbf{0 . 6 9 1 9}$ $\mathbf{1 . 0 0 0 0}$
lower-2 + upper-2 0.1164 0.2669 0.1955 0.0452 0.0475 $\mathbf{0 . 0 1 7 1}$ 0.7121 $\mathbf{1 . 0 0 0 0}$
lower-1 + upper-2 0.1271 0.2738 0.1928 0.0526 0.0642 $\mathbf{0 . 0 1 7 1}$ 0.7052 $\mathbf{1 . 0 0 0 0}$
w/o arm curriculum 0.1411 0.2726 $\mathbf{0 . 1 9 2 4}$ 0.0504 0.0618 $\mathbf{0 . 0 1 7 2}$ 0.7472 0.9995
Medium
lower-3 + upper-2 $\mathbf{0 . 2 1 9 2}$ $\mathbf{0 . 3 5 2 0}$ $\mathbf{0 . 2 0 0 7}$ $\mathbf{0 . 0 4 5 0}$ $\mathbf{0 . 0 5 9 8}$ $\mathbf{0 . 0 1 7 2}$ $\mathbf{0 . 7 6 0 4}$ $\mathbf{0 . 9 8 5 2}$
lower-2 + upper-2 0.2213 0.3571 0.2032 0.0458 0.0607 $\mathbf{0 . 0 1 7 2}$ 0.7748 0.9772
lower-1 + upper-2 0.2262 0.3872 0.2173 0.0492 0.0604 0.0175 0.7730 0.9273
w/o arm curriculum 0.2571 0.4348 0.2068 0.0476 0.0601 0.0173 1.0587 0.9652
Hard
lower-3 + upper-2 $\mathbf{0 . 2 2 0 2}$ $\mathbf{0 . 4 8 1 2}$ $\mathbf{0 . 2 1 1 6}$ $\mathbf{0 . 0 4 5 8}$ $\mathbf{0 . 0 6 0 0}$ $\mathbf{0 . 0 1 7 5}$ $\mathbf{0 . 8 5 5 1}$ $\mathbf{0 . 9 7 2 3}$
lower-2 + upper-2 0.2892 0.5395 0.2231 0.0482 0.0645 0.0178 0.9479 0.9233
lower-1 + upper-2 0.2566 0.5172 0.2451 0.0537 0.0777 0.0179 0.9462 0.8743
w/o arm curriculum 0.3658 0.6398 0.2394 0.0461 0.0726 0.0180 1.2042 0.8480

Table 3: Ablation studies of adversarial training technique and arm curriculum in ALMI.

The "lower-3 + upper-2" configuration consistently outperformed other approaches, demonstrating the importance of three-iteration adversarial training for the lower body.

Real-World Deployment: From Simulation to Reality

To ensure successful real-world deployment on the Unitree H1 robot, the researchers implemented extensive domain randomization during training:

Term Value
Dynamics Randomization
Friction $\mathcal{U}(0.1,1.25)$
Base mass $\mathcal{U}(-3,5) \mathrm{kg}$
Link mass $\mathcal{U}(0.9,1.1) \times$ default kg
Base CoM $\mathcal{U}(-0.1,0.1) \mathrm{m}$
Control delay $\mathcal{U}(0,40) \mathrm{ms}$
External Perturbation
Push robot interval $=10 \mathrm{~s}, v_{x y}=1 \mathrm{~m} / \mathrm{s}$
Randomized Terrain
Terrain type trimesh, level from 0 to 10
Velocity Command
Linear x velocity $\mathcal{U}(-1.0,1.0)$
Linear y velocity $\mathcal{U}(-0.3,0.3)$
Angular yaw velocity $\mathcal{U}(-0.5,0.5)$

Table 8: Domain randomization terms and ranges.

This domain randomization, combined with the inherent robustness of ALMI's adversarial training, allowed for successful deployment on the physical Unitree H1 robot. The real-world experiments demonstrated that ALMI could perform diverse motions while maintaining stability, even when faced with:

  • Uneven terrain
  • External pushes
  • Rapid directional changes

Implications and Future Work

ALMI represents a significant advance in humanoid robot control by addressing a fundamental limitation in existing approaches. By training upper and lower body policies adversarially, the researchers have created a more robust, natural approach to whole-body motion that mirrors how humans coordinate movement.

Unlike approaches like Mobile Television that use predictive models or HILO that focuses primarily on locomotion, ALMI directly addresses the coordination challenge through adversarial learning.

Future research directions might include:

  1. Extending ALMI to more complex manipulation tasks
  2. Integrating vision-based perception for reactive movement
  3. Combining ALMI with large language models for natural command interfaces
  4. Applying the adversarial training approach to other robotic systems with multiple control objectives

The researchers have released a large-scale whole-body motion control dataset with high-quality trajectories from MuJoCo simulations, which should accelerate further research in this area. This dataset, combined with the ALMI approach, provides a solid foundation for developing more capable, human-like robotic movement in the future.

Click here to read the full summary of this paper