UK lab’s humanoid robots get NVIDIA grant to turn sound into motion
Table Of Content
Chengxu Zhou, an associate professor in UCL Computer Science, has bagged an NVIDIA Academic Grant to support the latest endeavors to boost humanoid robots by focusing on real-time, audio-driven whole-body motion.
The Robotics and AI professor will be supported with resources for both training and deployment through this grant.
The resources include two NVIDIA RTX PRO 6000 GPUs and two Jetson AGX Orin devices. The addition of this result will enforce faster iteration and shorter training cycles, thereby reducing the time required between simulation and real-robot testing.
NVIDIA awarded the grant as part of the Academic Grant Program, which provides compute resources to researchers working across different subject areas.
Audio-driven movement for humanoid robots
Chengxu Zhou and his team are working on the Beat-to-Body project, which explores the possibilities of humanoids responding to sound with expressive, physically lausible, and safe whole-body movement.
Instead of working with pre-scripted code, the system allows the humanoid robot to adapt to cues such as tempo, accents, and fluctuations in loudness, and to provide reactions.
For instance, a consistent rhythm from clapping or music can guide a robot’s walking pace and body movements. At the same time, changes in the sound’s tone or intensity can create different styles of motion, ranging from quick, jerky steps to smooth, flowing movements.
The goal is to enable the robot to react dynamically and adapt its behavior in real time as the audio evolves. The audio-first, on-robot execution plays a key role in this process. Researchers train at scale in simulation using GPU compute.
In the meantime, the Jetson hardware enables low-latency inference directly on the robot, reducing reliance on offboard processing and enabling responsive reactions to sound cues.
Other studies
Zhou’s Beat-to-Body project aligns with a growing body of research exploring sound as a control signal for humanoid robots. A 2025 study demonstrated how robots can generate expressive locomotion and gestures directly from music and speech, without relying on predefined motion templates.
Earlier work has shown that humanoids can synchronize dance movements to musical rhythm and emotion. Complementary research on audio-visual tracking has also enabled robots to localize and respond to sound sources, underscoring audio as an emerging, intuitive interface for human–robot interaction.
Applications and future work
Within the field of robotics, the project advances work on expressive, full-body movement and more intuitive human–robot interaction.
In the near term, the technology could be used in interactive installations and performance-based settings. In the longer term, it may enable basic coordination among multiple robots using common audio signals.
The work is being conducted at UCL’s Humanoid Robotics Lab, which specializes in machine learning, control systems, and interaction for humanoid robots. The next phase of the project will focus on expanding simulation-based training and testing early closed-loop, sound-responsive behaviors on a real humanoid robot.

