Getting More Meaning From Less Data
Training perception systems often requires a lot of data. To teach a system to identify an object as a vehicle, pedestrian or something else, an engineer typically has to show it sensor readings of that object from many different angles and in many different environments. Vehicles, trees and even people come in many shapes and sizes, and exposing the system to more models of those objects enables it to identify them more accurately and extensively.
However, in a challenge leading up to the recent European Conference on Computer Vision, Aptiv intern Frederik Hasecke proved that it is possible to train a neural network well even when the data available is limited.
The competition was sponsored by Innoviz, one of Aptiv’s technology partners, and Nvidia. Four participating teams looked for ways to use Innoviz’s InnovizTwo lidar system and correctly perceive 3D images in situations where the system had collected and annotated only a limited number of lidar frames. Hasecke and his professor Anton Kummert won first place for their innovative approach. Hasecke is a doctoral student at the University of Wuppertal in Germany in the field of artificial intelligence and computer vision, and was working under a grant provided by Aptiv at the time of the challenge.
Participants were given a dataset with 1,200 lidar frames from several driving scenarios — but only 100 of those frames were annotated, meaning that objects had not been identified in the other 1,100 frames. In the annotated frames, the sponsors had identified only 790 cars, 30 pedestrians, eight bicycles, 17 motorcycles and 77 trucks, and teams had the task of training their systems to identify as many objects as they could in the unannotated frames.
The typical lidar frame in the set of 1,200 frames that were provided showed traffic scenes with the rough size and outline of objects picked up by the lidar. The lidar did not show the color of the objects. With only the three-dimensional point clouds, the teams could get a general sense of the shape of objects, but without much context or detail.
Given this limited dataset, Hasecke used techniques he has been working on while pursuing his Ph.D. He took scans of objects like cars, bicycles and trees both from the annotated frames supplied in the competition and from an outside 3D mesh source to match against objects in the frames that had not been annotated. By resizing, flipping and otherwise manipulating these imported objects, he and Kummert trained the underlying neural network to recognize more of the objects.
While radar and cameras are the primary external sensors for vehicles today, lidar is often combined with other data from test vehicles to establish ground truth. That is, a test vehicle can be equipped with a highly sensitive lidar to establish exactly what objects are around the test vehicle, their size, their distance and other factors. Perceptions from radars and cameras under development for production vehicles can then be compared to that ground truth to see how well they are performing. Radar will continue to provide fundamental sensing for all levels of driving automation, with lidar being added for Level 4 automation and autonomous mobility on demand.
The competition showed that lidar can be used to identify objects even with limited data, Hasecke says. That can lead to better object detection in self-driving cars sold to the public — and, ultimately, a safer automated driving experience.