Unchained Robotics company logo
Unchained Robotics company logo
July 6, 2022
Franziska Liebich
Industry News
Machine Vision
Machine Vision and Cameras: How Robots Learn to See - Unchained Robotics
[Did you know that Unchained Robotics is the largest online platform for cobots, robots, and automation? Browse and compare hundreds of models from dozens of brands. Transparent prices, huge selection, easy ordering. Discover the perfect solution for you!]
Robots in science fiction films are often modeled on humans. This applies to their appearance and the way they interact with their environment. But this is far from the reality. Will that change in the future?

People usually interact with their environment through their five senses - hearing, touch, taste, smell and sight. We perceive stimuli, interpret them and usually react intuitively without thinking. This ability is extremely important for our everyday lives, and the loss of a sense usually hits the person affected hard.

Robots, however, do not possess any of these senses. Nevertheless, developments are moving in the direction of robots being able to interact with their environment and react automatically to changes in the future. So how can we teach robots to perceive their environment? Or in short: How do robots see?

What is "seeing" anyway?

When we talk about "seeing," what do we mean? What has to happen in a human being for him to see?

Our eyes collect light reflected from objects in our environment. When the light hits the retina, it is converted into electrical signals and transmitted to the brain via the optic nerve. There, the electrical signals are processed and interpreted with the help of our memories. We then use the image that is created in the process to help us find our way in the world. This interpretation of the data we collect is actually the important part of the process.

Exactly how this process works is not clear. But researchers estimate that the process is so elaborate and extensive that, in some cases, half of our brain is engaged in processing the information our eyes gather. So it is an extremely complex process.

However, there are many other types of eyes in nature that enable different species to see, although many animal brains have far less processing power. This is especially true for insects, for example. So, in principle, it is possible to enable a "kind" of vision without the immense computing power of a human brain - or half of one.[i]

What is the point of teaching a robot to see?But why - if this process is so complicated - do we necessarily want to teach robots to see? Just to turn science fiction into science fact? Robots can already do impressive things: they can work collaboratively with humans in factories, for example, or quickly deliver packages in a warehouse.[ii]

But there are also many things that robots, precisely because they lack the ability to see, cannot do. In industrial robotics, vision guidance from robots in 2D and 3D can automate processes such as production, assembly, and material handling more flexibly than was previously possible. So, given the growing importance of mass customization and batch size one, this technology is extremely important.[iii]

Traditional industrial robots are based on technology that is already 60 years old and based on absolute precision, because the machines do not know what they are doing. They cannot operate effectively outside the factories for which they were built. Even slight deviations, for example in the position of objects to be moved, can pose a problem for the process.[iv]

Robots that can recognize and interpret what is happening in their environment would also be able to take over tasks that are still handled by humans. This would be very helpful in many places: in industries where there is a shortage of workers, in areas that are unattractive or even dangerous for humans, in work steps that require high concentration over a long period of time, and also in our own homes or even in the operating room.[v]

To further integrate robots into our lives and work, and to realize their full potential, the development of some kind of "sense of sight" - in the broadest sense - is necessary. Without it, they will not be able to understand their environment in context.

Why is it so difficult to teach robots to see?

So what are the concrete hurdles that need to be jumped over to enable a robot to perceive its environment and interpret it correctly?

In an industrial context, especially taking a part out of a box where many other parts are lying in disarray is a challenge, because the robot has problems recognizing unsorted objects. In the course of his life, a human being learns the meaning behind the images he sees. The vision system of a robot, on the other hand, only identifies objects correctly if they have been programmed or trained beforehand. Therefore, the developers of image processing systems must know beforehand what is to be done later in order to write an appropriate algorithm.[vi]

Getting Started

With a digital camera, one can enable the robot to collect information about its environment. In the early 1960s, people were still convinced that extracting information from camera images would be easy. But that was - unfortunately - far from the case: In the meantime, entire fields of research have formed around this problem: "machine vision" and "machine learning" are particularly important here[vii].

Instead of a brain, a robot has a computer that processes the signals collected by sensors and then sends commands to the motors. Image and depth data, i.e., colors and distances between camera and object, are especially helpful. However, the robot should not only be able to see its surroundings, but also to recognize the objects that are in them.

It must therefore establish a connection between the color information and the semantics that describe what an image represents. An image consists of several million pixels: color dots, with unique colors that are stored as numbers. The totality of these numbers must be transformed into information that shows what is seen in an image.[viii]

The challenges

One difficulty is converting the large amounts of recorded information into simple, abstract signals that the robot understands. This is mathematically extremely demanding and is further complicated because external influences such as weather and time of day also play a role. Humans can recognize such information as irrelevant and concentrate on what is important. Robots first have to learn this - a lengthy and laborious process.[ix]

Relevant information for recognizing and understanding the environment includes the position of the robot on a map, the 3D position and orientation of the objects surrounding it, the movement of objects, the type of object, possible forms of interaction (for example, the possibility of grasping the object), the area that can be walked or driven over and obstacles, as well as the structure of the immediate environment.[x]

In applications such as self-driving cars, cameras are supplemented by other sensors. This can be ultrasound, but also radar, sonar, or infrared sensors are used. All these sensors emit waves - light or sound - and then measure what is reflected back from the environment. This gives a picture of the environment, but it can't tell what's what. The robot - or in this case, the car - can't understand its environment because it doesn't recognize the different objects.But it doesn't just have to recognize the objects, it also has to recognize what the consequences or possible intentions might be when an object comes along: if a ball rolls into the street, a child might run after it. A self-driving car should be able to detect this and react accordingly.[xi]

What models are available to help robots detect their environment?There are several approaches to help robots perceive and recognize their environment. Some have been briefly mentioned before, but we will go into more detail here about the differences, advantages and disadvantages.

Ultrasonic sensor

Perhaps the most common way to detect obstacles is to use an ultrasonic sensor or echo sounder: A speaker emits high-frequency sound waves. These propagate and, when they encounter an obstacle, are reflected back.

These returning waves are detected by a microphone. One measures the time it takes the wave to get there and back.[xii]

As mentioned above, however, this technique has the problem that it only detects that something is in front of - or even beside or behind - it. However, it cannot distinguish between objects and therefore has a very limited response.

LiDAR: Light Detection and Ranging

A very similar technique is to detect the environment by emitting light waves (usually infrared light) and measuring the time it takes for the light to bounce back. There are three different variations, of this type of sensor:
  • Sensors that emit only one beam of light are most often used to measure distances to large objects (e.g. walls, floor, ceiling)
  • Sensors that emit multiple beams of light at once are useful for avoiding collisions and detecting objects.
  • Spinning sensors produce a beam of light as it spins. These are also used for object detection and collision avoidance.[xiii]

  • Capturing several million distances in all directions in this way produces a point cloud that provides a rough representation of the surrounding space.But there are problems here, too: reflective surfaces do not send the light pulses back to the sensor, but - according to the "angle of incidence = angle of reflection" rule - in a different direction. In addition, the measurement can be affected by fog or rain, since water molecules also redirect the light. In addition, the process is relatively slow and the devices so far very expensive for the fact that the "image" created is based only on geometric data. Human vision, on the other hand, also captures color and texture. However, prices have also come down in recent years.[xiv]

    Cameras and image processing

    How does a robot recognize what is on a camera image? Using a video camera, one continuously captures images that are passed on to the computer. Then an algorithm looks for noticeable elements in the images: Lines, interesting points or corners, and certain textures that it can track from frame to frame.

    Depending on what the robot will see and should do, software is then developed to recognize patterns and help the robot understand what is around it. In some circumstances, this creates a basic map of the environment as the robot works, or tries to compare the features it recognizes in the images with its database to find what it is looking for.

    However, this type of programming alone is not reliable enough to prevent the robot from colliding with something or falling. One supplements it with the models described above to make it usable.As one gets more and more computing power for less and less money, this technology is now becoming applicable in real life[xv].

    However, the learning curve is still large, as robots require more training data than humans. The German Aerospace Center has therefore developed a method that does not use real images but simulates scenes. In this way, the information to be learned (e.g., the type of object) can also be co-generated directly on the computer.[xvi]

    A new direction: machine learning

    A new research direction takes a slightly different approach: instead of programming the system, it should be able to learn. Inspired by how researchers assume animals see, they develop a structure. This system structure is not an algorithm, but the basis for what the robot works out for itself - what it learns. This is called "machine learning".

    By learning, robots can also share their learned knowledge: So not every robot has to start from scratch. Instead, it can access the accumulated knowledge of other robots - for example, via a cloud. It would then be enough for one robot to solve a complex task for all the other robots in the network to learn it as well. It's an idea that's scary for some and exciting for others.[xvii]Further development of hardwareOne problem with this technology is privacy considerations, for which no standards or rules have yet been established.[xix]


    In addition to machine learning, new cameras are being developed that can see even better than a human's eyes: A new camera, developed by researchers at Stanford University, is modeled on the eyes of insects: It has 200,000 extremely small microlenses that collect detailed information about every light stimulus they pick up. This "light-field photography," or computational photography, is able to capture a wider field of view than humans, and gather more information.[xviii]

    Teaching robots to see is a big challenge. At the same time, it is an idea that has the potential to open up many new application areas for robots, and make existing ones more efficient.

    The biggest challenge is to make the robot understand what is in its environment, and whether it should react to it. And if so, how it should react. Machine learning has enabled great progress in this area, and image processing and camera technology have also advanced.

    Nevertheless, different types of sensors are currently still being combined - especially in self-driving cars - in order to compensate for the weaknesses of the respective systems.


    • [i] Roberts, Jonathan (December 22, 2015): “How do robots ‘see’ the world?”, https://phys.org/news/2015-12-robots-world.html.
    • [ii] Amin, Geet (June 24, 2019): “How Robots Perceive the World Around Them”, https://www.roboticsbusinessreview.com/news/how-robots-perceive-the-world-around-them/.
    • [iii] Kunze, Sariana (April 23, 2019): “Wie Sensoren ROboter fühlen, sehen und lernen lassen”, https://www.elektrotechnik.vogel.de/wie-sensoren-roboter-fuehlen-sehen-und-lernen-lassen-a-820602.
    • [iv] Condie, Bill (October 01, 2017): “Making robots see. Machine learning and deep neural networks are helping roboticists create machines that can see their environments. Bill COndie reports”, https://cosmosmagazine.com/technology/making-robots-see/.
    • [v] Roberts, Jonathan (December 22, 2015): “How do robots ‘see’ the world?”, https://phys.org/news/2015-12-robots-world.html.
    • [vi] Fischer, Teresa (January 30, 2020): „Wenn Roboter sehen lernen“, https://www.blog.kuka.com/2020/01/30/wenn-roboter-sehen-lernen/.
    • [vii] Roberts, Jonathan (December 22, 2015): “How do robots ‘see’ the world?”, https://phys.org/news/2015-12-robots-world.html.
    • [viii] Triebel, Rudolph (June 25, 2020): „“Wie künstliche Intelligenz uns hilft, Robotern das „Sehen“ beizubringen“, https://www.dlr.de/blogs/alle-blogs/wie-kuenstliche-intelligenz-uns-hilft-robotern-das-sehen-beizubringen.aspx.
    • [ix] Geiger, Andreas (2015): „Roboter lernen sehen“, https://www.mpg.de/9922487/mpi-mf_jb_2015.
    • [x] Triebel, Rudolph (June 25, 2020): „“Wie künstliche Intelligenz uns hilft, Robotern das „Sehen“ beizubringen“, https://www.dlr.de/blogs/alle-blogs/wie-kuenstliche-intelligenz-uns-hilft-robotern-das-sehen-beizubringen.aspx.
    • [xi] Swain, Frank (August 25, 2020): “Robotics: How machines see the world”, https://www.bbc.com/future/article/20140822-the-odd-way-robots-see-the-world.
    • [xii] Wevolver (2019): “How do Robots ‘See’ in 3D?”, https://www.hackster.io/news/how-do-robots-see-in-3d-7374737e7e99.
    • [xiii] Amin, Geet (June 24, 2019): “How Robots Perceive the World Around Them”, https://www.roboticsbusinessreview.com/news/how-robots-perceive-the-world-around-them/.
    • [xiv] Kell, Adam (August 08, 2016): “This is how Robots See”, https://blog.cometlabs.io/this-is-how-robots-see-aa353b0cd857.
    • [xv] Roberts, Jonathan (December 22, 2015): “How do robots ‘see’ the world?”, https://phys.org/news/2015-12-robots-world.html.
    • [xvi] Triebel, Rudolph (June 25, 2020): „“Wie künstliche Intelligenz uns hilft, Robotern das „Sehen“ beizubringen“, https://www.dlr.de/blogs/alle-blogs/wie-kuenstliche-intelligenz-uns-hilft-robotern-das-sehen-beizubringen.aspx.
    • [xvii] Roberts, Jonathan (December 22, 2015): “How do robots ‘see’ the world?”, https://phys.org/news/2015-12-robots-world.html.
    • [xviii] Orns, Steven (November 17, 2017): “Seeing the world though a robot’s eyes”, https://www.sciencenewsforstudents.org/article/seeing-world-through-robots-eyes.
    • [xix] Fischer, Teresa (January 30, 2020): „Wenn Roboter sehen lernen“, https://www.blog.kuka.com/2020/01/30/wenn-roboter-sehen-lernen/.
Finde den passenden Roboter
Erstelle deine Lösung
Compare (0/3)