Beyond the Bot Ep. 2: Inside Figure 1 and Helix with Bhargav Bompalli
- Ellen Cochran
- Feb 28
- 7 min read
In this episode of Beyond the Bot, Tony DeHart and Steven King sit down with Bhargav Bompalli, the senior Computer Vision and Robotics Engineer at Blue Sky Robotics, to analyze the recent unveiling of Figure's humanoid robot powered by the Helix AI model. The discussion dives deep into the technical and societal implications of this advancement in robotics, covering everything from vision-language-action (VLA) models to human-robot interaction, and the training infrastructure required to bring such a machine to life.
This conversation provides a practical and forward-thinking lens on how humanoid robots, though not always the most efficient solution, offer a flexible, scalable toolset for future automation. The team explores how Helix’s twin-model architecture, its massive training corpus, and intuitive task reasoning set a new benchmark in robotics. If you're curious about where human-like robots are headed and how businesses might apply these technologies, this is an essential listen.
Transcript:
Tony DeHart: Welcome to the latest episode of Beyond the Bot, where we break down the latest and emerging technology—and how to put it to work in your business. I'm Tony DeHart.
Steven King: And I'm Steven King.
Tony: Today we're going to be breaking down some exciting news: we got a first look at Figure 01’s Helix this week, which is the latest in humanoid robotics. To break this down from a technical perspective, we had to bring in Bhargav Bompalli, our Senior Computer Vision and Robotics Engineer at Blue Sky Robotics. Bhargav, thank you so much for joining us.
Bhargav Bompalli: Thank you for having me.
Tony: In some ways I feel a little bit of deja vu because we’ve seen advancements in humanoid robotics before—particularly with Tesla’s Optimus bot. We broke down that demo in a previous video. Some folks were a little skeptical of that technology. Steven, how is this release different?
Steven: Well, for one thing, when we were talking about the Tesla one, everything was presented as if it was AI-driven. There was a lot of great AI stuff there, but some of it was done by human remote controllers. In this case, we have a video from another company—and they also released a lot of information that Bhargav was able to dig into and actually understand the mechanics behind. That gives us a higher level of trust in what we saw.
Tony: So Bhargav, when you were researching this and looking into the technical details, were you able to follow how they pulled this off?
Bhargav: Yes, it’s a very exciting technology. And the fact that they published all these findings on their website and put out a press release makes it more trustworthy and repeatable in the public’s eyes.
Tony: That transparency is really important to build trust. Steven, before we dive into the technical breakdown, is this what we’re looking at as the future of robotics?
Steven: There’s definitely a lot of excitement around humanoid robots. We live in a human world built around our form factor—stairs, tools, environments designed for humans. So there's inherent value in robots that can navigate that. Plus, there's the dream of a Rosie-from-the-Jetsons style assistant that can multitask. But it’s not the only path forward. Task-oriented robots that are specialized for efficiency in a narrow area also offer huge value. That said, Figure just raised a $1.5 billion investment, making them a $40 billion company. They have the resources to solve hard problems.
Tony: Bhargav, this is exciting from a technical perspective because it’s a first look at Helix—the vision-language-action model they developed in-house. It’s what allowed them to break from their long-term partnership with OpenAI. Can you tell us what a vision-language-action model is, and how that enabled this shift?
Bhargav: Absolutely. This demo was a flawless execution of a VLA—a vision-language-action model. It combines a vision-language model (VLM), which understands visual inputs and language, with action outputs. So the robot can interpret what it sees and hears, and then perform a corresponding physical task.
Tony: Now, interacting with robots via voice isn’t entirely new. How is this different from something like an Alexa?
Bhargav: Think of Alexa as the first step. You ask it for a recipe, it tells you one. A VLA model can take that a step further. You say “make me breakfast,” and the robot identifies ingredients, reasons through the task, and actually makes it—combining vision with physical action.
Steven: From a human-robot interaction perspective, this is a big leap. Historically we used teach pendants or coded instructions. Now, people expect to interact with humanoid robots the same way we interact with other humans—using natural language and gestures. This video demonstrates how those expectations are being met.
Tony: So it’s like having another tool in the toolkit.
Steven: Exactly. Think of humanoid robots as the Swiss Army knife—multi-purpose, adaptable, and great for general use. But sometimes you want a specialized tool—like a chef’s knife instead of a multi-tool blade. Each has its place.
Tony: Bhargav, this robot was interacting with objects it had never seen before. How was it able to do that?
Bhargav: It's trained on a 7-billion parameter vision-language model. While that’s not the largest out there, it’s efficient. It’s seen millions of images online, so it can generalize. Even if it’s never seen a ketchup bottle, it's seen enough similar objects to know how to interact with it. It maps visual cues to actions.
Steven: I noticed that in the fridge scene, it placed the ketchup bottle next to another condiment already laying on its side—suggesting it understood spatial logic and context.
Tony: Let’s talk about the training process. How did they train Helix compared to how we traditionally annotate datasets?
Bhargav: Figure used over 500 hours of teleoperated robot data—robots performing actions like picking up boxes, placing them on shelves, etc. Each action was mapped to a text prompt like, “Pick up the cardboard box and place it on the top shelf.” And interestingly, those prompts were AI-generated. So another AI helped annotate the training data.
Tony: We also saw two robots working together in the video. Were they communicating?
Bhargav: Surprisingly, no. Figure said there's no direct communication protocol between the robots. They’re both running the same neural network—so they act with the same knowledge. It's like having one brain across two bodies. They collaborate by awareness, not communication.
Steven: It’s kind of like how your brain controls both arms. They’re not talking to each other—they’re both just acting based on shared knowledge.
Tony: Now, I can’t get past this moment where the robots make eye contact. I mean, if they aren’t actually communicating, what are they doing?
Steven: So, we don't have any insight specifically as to what the engineers were thinking here, but I think this is part of that human-computer interaction and psychology. If they’re going to have a human form, they need to have some of the behaviors of humans. But I don't know if they hardcoded this for the video or if this is just part of the programming. Like, “hey, we completed a task together!” and they look at each other. Maybe there's going to be a high five one day, right? But the way humans work with robots is going to be really important for adoption. If people are going to adopt it, they have to feel comfortable with it, and this may be one of those steps that helps that.
And we’ve seen that in our own deployments, too. We sometimes name the robots, and when we make these baby steps towards personification, people tend to work with them, not against them. When we're deploying robots, they’re typically being used alongside workers, and so we want them to be a team, right? So one thing is we even personify them to the point of when you see one of your robotic arms laying on its side because it just hasn't been installed or something yet, it feels weird! It feels like something's wrong and it's sick or something. So I think this is something that we naturally do as humans, and I think as we interact more and more with robots, we're gonna see this type of interaction with them.
Tony: So, Bhargav, if they aren’t actually communicating yet, can you tell us about the AI that’s actually running under the hood? Is it just one giant VLA model? What’s the story there?
Bhargav: So this is actually another big breakthrough for Figure. So I think this is how they got the name Helix, but basically it’s two AI models both running asynchronously in order to perform their functions as smoothly as they did in the demo. The first one they called S2, which is kind of like the brains of the robot. So S2 is what is running the VLM, which is the vision-language model. So it is constantly analyzing what is sees (so visual data) and it’s also analyzing its robot pose. So it knows what’s in front of it and also what position the arms are in and where the torso is, and it feeds that information to the “brain.” The second model is S1, and this is like the muscles of the robot. Basically, it uses the same visual input as S2, but this time it outputs fine motor function. So depending on what it sees, like a ketchup bottle for example, it’s using this transformer model to convert that into how to pick it up.
Tony: Wow, that’s really interesting. You know, in some cases it’s a lot of the same underlying principles that we use in our solutions. Steven, are humanoid robots something we might see in Blue Sky’s own solutions in the future?
Steven: I wouldn’t say no. But for now, most of our work benefits more from single-arm robots focused on specific tasks. Humanoids are great generalists, but if the job is repeatable and well-scoped, a simpler robot is usually better. That said, it’s a space we’ll keep watching.
Tony: Bhargav, thank you so much for joining us.
Bhargav: Thanks for having me.
Tony: And thank you for tuning in. We’ll see you next week for another episode of Beyond the Bot, where we break down the latest emerging tech—and how to put it to work for your business.