Beyond the Bot Ep. 9: How Computer Vision is Powering the Future of Robotics
- Ellen Cochran
- Apr 25
- 5 min read
Updated: Jul 2
For Beyond the Bot this week, host Anthony DeHart sits down with Blue Sky Robotics' Computer Vision and Robotics Engineer Bhargav Bompalli inside the Blue Sky Lab to explore the cutting-edge world of computer vision and its transformative applications across industries. From basic image filters to advanced AI models like YOLO and generative adversarial networks (GANs), they unpack how machines are being trained to see, understand, and even simulate the world around them. The conversation offers a deep dive into how computer vision intersects with machine learning, reinforcement learning, and robotics, showcasing how synthetic data and simulation environments are revolutionizing everything from quality control to autonomous manufacturing.
Whether you're a tech enthusiast, robotics developer, or industry leader, this episode offers powerful insights into how computer vision is rapidly redefining the way machines interact with our world.
Transcript:
Anthony DeHart: Hello and welcome to another exciting episode of Beyond the Bot, where we break down the latest in AI and robotics and what it means for you and your business. I'm Tony.
Bhargav Bompalli: I'm Bhargav.
Tony: And we're in the Blue Sky Lab.
[Music]
Tony: Bhargav, you're a computer vision engineer on a day-in, day-out basis, right? So there's really nobody better to discuss these topics than someone like yourself.
Bhargav: That's correct.
Tony: So can you tell us—what does a computer vision engineer actually do on a day-to-day basis?
Bhargav: A computer vision engineer builds computer vision algorithms. In a general sense, computer vision is a software technique for allowing computers to interpret, see, and understand the world around them. These techniques can range from simple mathematical formulas to complex AI or deep learning generative AI solutions.
Tony: We often talk about computer vision from the standpoint of artificial intelligence and machine learning, but it doesn't necessarily always have to be AI-based to be considered computer vision, right?
Bhargav: That's correct. While computer vision is typically associated with AI these days, even something as basic as applying a filter to an image or performing edge detection counts as computer vision. These tasks use simpler mathematical techniques.
Tony: Sometimes the term "computer vision" can be a little misleading. When we think about vision, we think about how our eyes perceive the world. For a computer, it's not always the same. How do we translate shapes, colors, and rich visual details into something a computer can understand?
Bhargav: We use sensors. Sensors are a huge part of computer vision. They allow computers to interpret the world. A very common sensor is a standard RGB camera, or a depth camera that combines RGB with depth-sensing technology.
Tony: When we take something like an RGB feed from a camera, how does the computer understand that? Is it analyzing the image as a whole or pixel by pixel?
Bhargav: It's actually pixel by pixel. Unlike humans, who can look at something and instantly recognize it, a computer breaks the image or video into pixels or groups of pixels and analyzes each pixel's color and value in relation to the ones around it.
Tony: So it's really a statistical and pattern recognition process?
Bhargav: Exactly.
Tony: We have these basic algorithms for image processing. How does AI and machine learning expand their capabilities?
Bhargav: AI takes it to the next level—it introduces autonomy. With AI, we don't have to hardcode all the statistical models. We can just tell the system what we want to detect. Sometimes it doesn't even need explicit instructions—it can learn on its own to identify things like cell phones or pets.
Tony: That's deep learning, right? Can you clarify how deep learning fits into the picture with machine learning?
Bhargav: Deep learning is a subset of machine learning. In computer vision, we often use convolutional neural networks (CNNs). A common one is YOLO—"You Only Look Once"—which does object recognition. It can detect objects like microphones or people in an image. Other models like ResNet or semantic/instance segmentation models go further and actually separate objects from the background.
Tony: If we take YOLO as an example, how does it learn what a microphone looks like?
Bhargav: It starts with us. We provide training data—images or videos of microphones in various settings. The model learns from those examples. Then, when it sees a new image, it uses that learning to recognize if a microphone is present and gives a probability.
Tony: So it’s similar to teaching a human a new skill—show it many examples and eventually it figures it out?
Bhargav: Exactly. At the pixel level, it’s recognizing patterns of color and shapes that make up a microphone.
Tony: You mentioned reinforcement learning earlier. How does that differ?
Bhargav: Reinforcement learning is more like how a baby learns to walk—through trial and error. The system is rewarded for correct actions and penalized for incorrect ones. So if it recognizes a microphone correctly, it gets rewarded. If it mistakes an iPad for a microphone, it gets penalized. Over time, it learns better.
Tony: So with YOLO, humans are labeling data upfront. With reinforcement learning, the human input is more about setting up the reward system and monitoring the outcome?
Bhargav: Exactly. The model starts from scratch and learns over time based on feedback.
Tony: Let’s go back to the hardware side. We talked about RGB sensors. But in robotics, you often need to know not just what something is, but where it is. How do we get that spatial information?
Bhargav: RGB gives 2D info. For 3D positioning, we use LIDAR or depth cameras. LIDAR uses laser light to determine object distances, creating a 3D map. This allows a robot to not just recognize an object, but also to locate and manipulate it.
Tony: So it’s not just what the object is, but where it is and what it should look like in context?
Bhargav: Right. You can use it for quality assurance, fault detection, and more.
Tony: You mentioned generative AI as an exciting development. Can you expand on that?
Bhargav: Generative AI, like DALL-E or GANs (Generative Adversarial Networks), can generate new training data. Instead of collecting real-world images, we can generate photorealistic simulated environments. That strengthens our models without requiring physical setups.
Tony: So machines can now create their own training data?
Bhargav: Exactly. That reduces human involvement and makes the whole training process more efficient.
Tony: What are the implications of synthetic data for industry?
Bhargav: It streamlines everything. Instead of physically placing a robot in an industrial environment, we simulate it. The robot can train in a photorealistic version of its real-world workspace.
Tony: Can you simulate different environmental conditions too? Like in a dusty, unpredictable factory versus a clean lab?
Bhargav: Yes. That’s a growing area. Simulation platforms allow us to tweak lighting, textures, colors, materials—everything. That helps improve model robustness and deployment readiness.
Tony: Let’s talk about real-world applications. We can imagine pick-and-place robotics or automated welding. What’s the next frontier?
Bhargav: The goal is to move beyond predefined patterns. Instead of telling the robot exactly what to do, we train it to be goal-oriented using reinforcement learning. Then it can handle unpredictable object placements on its own.
Tony: These are powerful tools. What safeguards do we need to ensure safety and fairness?
Bhargav: Human intervention is key. We need to ensure the training data is unbiased and fair. In reinforcement learning, we have to carefully design the reward and punishment systems to ensure ethical behavior.
Tony: Bhargav, this has been super interesting. Thank you for shining a light on this topic.
Bhargav: Thank you.
Tony: And thank you all for joining us. Stay tuned—we'll have another topic for you next week on Beyond the Bot.