OpenAI, the San Francisco-based nonprofit research lab backed by Elon Musk, today announced a research milestone on its robotics work. The achievement is a new algorithm that allows a human being to communicate a task to an AI by performing it first in virtual reality. The method is based on what’s known as one-time imitation learning, a technique OpenAI developed to allow software guiding a robot to mimic a physical action using just a single example.
In this case, OpenAI is trying to teach a robotic arm how to stack a series of colored cube-shaped blocks. A human being wearing a VR headset first performs the task manually within a virtual environment. OpenAI then has its vision network — a type of neural network trained on a hundreds of thousands of simulated images — observe the action. This part of the process is based on previous OpenAI research focused on training AI using simulated data with ever-changing variables.
Because collecting real-world images is intensive, costly, and time-consuming, achieving a similar effect with simulated data is faster and more efficient. In this case, OpenAI chose not to use real-world photos of its setup, choosing instead to feed its algorithm a trove of virtual images of the table and blocks with different styles of backgrounds, lighting effects, and textures. Doing this allows the algorithm, when it does analyze the camera feed from the robot, to understand the scene without ever having seen it before.
OpenAI’s algorithm then takes the information gleaned from the vision network and feeds it to a second neural network, called an imitation network, guiding the robotic arm. It susses out what the intent of the action is supposed to be and then mimics it by predicting what the human actor would have done in a similar situation. The tricky part, of course, is that the blocks are colored and arranged differently every time, and yet the software can stack three separate two-cube stacks, regardless of the initial setup:
All of this is done using only simulated data, and never by showing the robot videos or photos of real-world examples. “Our robot has now learned to perform the task even though its movements have to be different than the ones in the demonstration,” explains Josh Tobin, a member of OpenAI’s technical staff, in a video produced to demonstrate the new algorithm. “With a single demonstration of a task, we can replicate it in a number of different initial conditions. Teaching the robot how to build a different block arrangement requires only a single additional demonstration.”
The longer-term goal here is to give AI the ability to learn new behaviors quickly and to use that knowledge to adapt to unpredictable changes in an environment. “Infants are born with the ability to imitate what other people do,” Tobin says. “Imitation allows humans to learn new behaviors rapidly. We would like our robots to be able to learn this way, too.”