What is RT-2?
RT-2 stands for Robotics Transformer 2 and is a Vision Language Action Model (VLA). It is an advanced version of its predecessor, RT-1, which was only capable of completing simple tasks like picking up items. With RT-2, robots can understand natural language commands like “throw away the trash” and carry them out without having seen the task before. This is due to the technology’s ability to learn from vast amounts of online data.
How Does RT-2 Work?
RT-2 consists of two main parts: a Vision Language Model (VLM) and a Vision Language Action Model (VLA). The VLM part of RT-2 learns from online text and images like Wikipedia articles and news stories. It then converts this information into special formats called embeddings, which capture the essence of the text or image. The VLA part of RT-2 also uses online data but adds in robot data, including images the robot sees and commands given by people. It then converts these embeddings into robot actions.
RT-2 also uses something called VLM Transformation, which adjusts the VLM to predict robot actions instead of just text or images. This means that robots can do tasks that they haven’t seen before, as long as it makes sense based on what it knows. For example, if asked to move an apple to zero, it can do so without being told how to do it.
What Can RT-2 Do?
RT-2 can do a variety of tasks, from sorting trash to picking up objects. It can also handle tasks with multiple steps, like moving a banana to the sum of two plus one. It can also adjust to new settings, like different rooms, and do things on the spot, like catching a falling bag or cleaning up a spill with a towel. It is also able to understand visual only jobs, like sorting items by color, without any language input.
Main Takeaways
Google’s new RT-2 is a major improvement over its predecessor, RT-1. It is able to understand natural language commands and carry them out, even if it hasn’t seen the task before. It can also adjust to new settings and do tasks with multiple steps. With its ability to understand visual only jobs and its potential to revolutionize the way robots interact with their environment, RT-2 could have a major economic impact. To learn more about this amazing technology, be sure to watch the full video!