Machine Learning – Inferior data sets can increase the risk of accidents in driver-assistance systems: Is quantity really the best solution for training highly functional ADAS?

Advanced driver-assistance systems (ADAS) are nowadays not only a standard accessory in new cars, but also an important milestone on the road to autonomous driving. The more the technical assistants are supposed to be able to do independently, the better the neural networks on which the systems are based have to be trained for this. Accordingly, the data sets used continue to grow. Yet, to what extent do the training data actually reflect the operational domains of the ADAS? This is often of secondary importance. In order to reduce the systems’ susceptibility to errors, only the quantity of data has been kept constantly increasing up to now. This results in unnecessarily complex, lengthy and inefficient development processes. ARRK Engineering has therefore developed an approach that allows to correct or remove data that is inaccurate or distorts reality and to train the ADAS in a targeted, reliable and at the same time resource-efficient manner.

Well-known OEMs are leading the way, mobility start-ups are following suit and consumers want it: more and more vehicles are being equipped with level 2 and level 3 driver-assistance systems. Thus, every day, numerous road users rely on lane keeping assists or autosteer (LKA/LCA), automated parking and adaptive cruise control (ACC). The general safety on the roads – and thus the safety of all road users – depends to a large extent on the proper functioning of these systems. To ensure this, their neuronal networks are trained with the help of huge data sets.

Huge data sets: the vicious cycle of the mass

The more complex the functionalities of different ADAS, the more specific data sets are required for their training. In order to cover all possible traffic situations, the data sets have been expanded more and more in recent years, focusing primarily on mass, i.e. the sheer number of recording hours or of annotated objects in different weather and lighting conditions. However, this inevitably increases the proportion of data that is inaccurate or simply unsuitable for a particular operational domain. To ensure that the newly developed ADAS continue to function reliably, their quality deficit has in turn been compensated for with quantity – a vicious cycle. This has already led to very long development times with many iteration loops, in which the training of the neural networks alone takes several weeks.

To escape this dilemma, the automotive industry needs to shift the focus away from quantity and towards quality of data sets. Therefore, the machine learning specialists at ARRK Engineering have developed an approach to validate the processes with regard to an operational domain and to correct them if necessary. In this way, development can be made more efficient and, more importantly, the functional safety of the ADAS can be increased.

Example ACC: Analyzing the data sets with regard to trajectory planning

The ADAS function ACC automatically controls the acceleration and breaking of a vehicle, always maintaining sufficient distance to other road users and obstacles. To do this, the system calculates the so-called time-to-collision (TTC) for each detected object. If it drops below a defined threshold value, the vehicle reacts accordingly with deceleration. Thereby applies: The greater the speed difference between the vehicle itself and an object in front of it, the shorter the TTC and the earlier the ACC must react. Due to this correlation, on motorways, for example, the system needs to reliably detect objects at a significantly greater distance, whereas in urban environments, potential obstacles may be located at a much wider close range. But are the ADAS of the different manufacturers even capable of doing this? To find out, the researchers focused on trajectory planning. For the deployment location Germany, they defined three driving scenarios with different speed limits and correspondingly varying requirements for the ACC: motorway (130 km/h recommended speed), country road (max. 100 km/h) and urban traffic (max. 50 km/h).

For their analysis, the researchers drew on a total of six data sets, including large models that have been used by well-known OEMs for years: ONCE, nuScenes, A2D2, LyftLevel5, Waymo and Kitti. The first objective of the study was to determine the statistical distribution of the annotated objects that the system is able to identify. This involved assessing the size of the bounding boxes used for annotation, the relationship between their size and distance from each other, the distance between the vehicle and other objects as well as their relative position distribution on the sensor and the optical flow of the image sequences. With these parameters, the researchers determined how precisely the objects were annotated and accordingly tagged with bounding boxes in the first place. In addition, they investigated how well the camera sensors of the vehicles were adjusted to their operational areas of use and what proportion, for example, was accounted for by standing phases, during which (almost) static images were captured over a longer period of time.

Optimizing the examined data sets

The results of the analyses and the inferred quality of the data sets came as a surprise to the machine learning specialists at ARRK Engineering – and in a negative way. For example, they discovered an unexpectedly large number of static images from traffic congestion and standing phases, which were not marked as such and could therefore negatively influence the detectors’ accuracy. On the other hand, all the data sets examined completely lacked annotated objects at greater distances of around 100 m and more. However, due to the short TTC at speeds of about 130 km/h, it is imperative that ADAS detect such distant obstacles in order to ensure safe use on German motorways. In addition, the annotation of the objects can often be inaccurate because many people are working on the data sets with different approaches. To compensate for this, the bounding boxes are set more amply than necessary and often overlap. This in turn makes it harder for the systems to detect obstacles and prolongs the training processes.

The researchers now aim to improve the poor quality of the data sets with regard to the development of level 2 and 3 ADAS. Therefore, they developed an approach to validate the models considering the operational domains of the systems and to correct their deficiencies accordingly. For example, the precision and generalization of the detector trainings can be increased by eliminating images with significantly overlapping bounding boxes as well as static images. In addition, the study results allow determining to which extent a data set is applicable to a specific operational scenario, and amending it accordingly during training. In the case of ACC, this includes driving in urban areas, in the countryside or on motorways. By better aligning the camera sensors to the actual traffic situation, the efficiency of the calculations and thus also the reaction time of the ACC can be improved.

Increasing safety on the roads thanks to validated data sets

Validating data sets with ARRK Engineering’s newly developed approaches allows reducing the required iteration loops in the training process and thus the overall development time of the ADAS significantly. More efficient training can therefore save valuable time already in the development phase. In addition, systems that have been precisely trained for their operational domain also feature higher functional safety. In practical use, for example, ACC can detect moving objects and stationary obstacles more reliably and initiate appropriate decelerating in time. Given that more and more highly functional ADAS will be on our roads in the future on the path to autonomous driving, this will increase general safety in daily road traffic.

The paper was presented at the SafeAI 2023 conference: https://safeai.webs.upv.es/

ARRK Engineering is a globally active development partner for the automotive and mobility industry specializing for more than 50 years in end-to-end and comprehensive support of the entire product development process – from concept phase through series development to validation and system integration of mechanical and electronic components. Within the focal topics, ARRK Engineering works in interdisciplinary teams of experts with the aim of realizing projects quickly and in a technically comprehensive manner. Coupled with excellent project management, ARRK Engineering thus solves the complex development tasks of its customers with high quality and high customer satisfaction. To achieve this, ARRK Engineering relies on the many years of interdisciplinary expertise of its 1,600 employees at locations in Germany, Romania, China, the Netherlands, Malaysia and Japan. As a member of the international ARRK Group, ARRK Engineering has further resources available worldwide to also support customers in global markets.

By Václav Diviš, Senior Expert Machine Learning at ARRK Engineering

More info for readers/viewers/interested parties:

ARRK Engineering GmbH
Frankfurter Ring 160, 80807 Munich, Germany
Tel.: +49 (0) 89 31857-0, Fax: +49 (0) 89 31857-111
E-mail: info@arrk-engineering.com
Internet: www.engineering.arrk.com

More info for the editors
ABOPR Press Service B.V.
Stefan-George-Ring 19, 81929 Munich, Germany
Tel.: +49 (0) 89 500 315-20, Fax: +49 (0) 89 500 315-15
E-mail: info@abopr.de
Internet: www.abopr.de

Machine Learning – Inferior data sets can increase the risk of accidents in driver-assistance systems: Is quantity really the best solution for training highly functional ADAS?

Contact person

Latest posts