Deep learning for enhancing object detection in autonomous driving

  1. Manuel Carranza García
Supervised by:
  1. José Cristobal Riquelme Santos Director
  2. Jorge García Gutiérrez Director

Defence university: Universidad de Sevilla

Year of defence: 2022

Type: Thesis


Autonomous driving is one of the most important technological challenges of this century. Its development will revolutionize mobility and solve many problems associated with it. The popularity of self-driving vehicles is growing, given their great potential to improve relevant aspects such as road safety or reducing traffic and pollution. Despite the recent advances in artificial intelligence, there are still many challenges to be solved for autonomous vehicles to become completely reliable and safe. Among them, the correct perception of the environment is fundamental. The vehicles have to detect the different elements involved in traffic and predict their movement precisely, robustly, and in real-time, which increases the complexity of the problem. In this Ph.D. dissertation, presented as a compendium of research articles, we explore new approaches to improve the perception systems of autonomous vehicles using onboard sensors’ data. Specifically, we develop novel deep learning techniques to enhance the performance in the object detection task, approaching the problem from different perspectives. The presented studies use real datasets from autonomous driving companies such asWaymo or Motional, which were recently shared to enable researchers to contribute to the progress of this technology. First, we conduct an experimental review of deep learning detectors for autonomous driving. This work analyzes the speed/accuracy trade-off of thirty different detectors, which is essential in this real-time application. This evaluation allows identifying the most suitable models in this context and possible lines of research to be addressed. Second, we design a detector that is specifically adapted to the particularities of this scenario. We develop a novel anchor optimization method, based on evolutionary algorithms, that considers the perspective of the vehicle’s cameras. Furthermore, different learning strategies are explored to deal with the class imbalance issues. For this purpose, we implement a more effective loss function for the network and an ensemble model. The proposed modifications provide a significant improvement compared to the default generic configuration without increasing the computational cost required in inference. Third, we develop an efficient RGB and LiDAR data fusion architecture to build a more robust detection system. Our proposal integrates a neural network for LiDAR depth completion into the detection pipeline. With this network, the resolution of depth data is improved using transfer learning, resulting in a more effective fusion with RGB images. The proposed method increases the detection precision under diverse lighting conditions, especially at night, compared to other approaches that use classical algorithms to upsample the sparse LiDAR projections. Finally, we implement an enhanced detection architecture to fully leverage the temporal information naturally present in the recorded LiDAR data. Our approach uses a Transformer layer with an attention mechanism that captures spatio-temporal dependencies in the sequential data, thus achieving better accuracy in object detection.