Effective methodologies for 3D perception in autonomous vehicles (Doctoral thesis)

Ζαμανάκος, Γεώργιος/ Zamanakos, Georgios

This doctoral dissertation aims to develop innovative and effective methodologies for 3D perception in autonomous vehicles. Specifically, after an extensive analysis of state of the art works, four contributions are proposed. The first one relates to the LiDAR-camera sensor setup, used for capturing a driving scene, so as to enable the optimal data association between the two sensor modalities. A method for LiDAR-camera extrinsic calibration from multiple static scenes is proposed, using a simple design for calibration target with an ArUco marker. Towards this end, a novel LiDAR-camera cooperative scheme is employed. At first, the camera-based detection of the marker guides a processing of the LiDAR point cloud to detect the 3D marker in it. Once the marker has been accurately localized in the LiDAR point cloud, further correction takes place regarding the pose estimation of the marker from the camera sensor. In this way, the advantages of each sensor are used to improve marker localization. The improved accuracy achieved in the computation of the extrinsic calibration parameters has been experimentally shown in both quantitative and qualitative terms. In the second one, the task of 3D object detection is tackled, which is a key element for the 3D perception of autonomous vehicles. LiDAR sensors are commonly used to perceive the surrounding area, producing a sparse representation of the scene in the form of a point cloud. The current trend is to use deep learning neural network architectures that predict 3D bounding boxes. The vast majority of architectures process the LiDAR point cloud directly but, due to computation and memory constraints, at some point they compress the input to a 2D Bird’s Eye View (BEV) representation. The proposed 2D neural network architecture, namely the Feature Aware Re-weighting Network, is employed for feature extraction in BEV using local context via an attention mechanism, to improve the 3D detection performance of LiDAR-based detectors. Extensive experiments on five state-of-the-art detectors and three benchmarking datasets, namely KITTI, Waymo and nuScenes, demonstrate the effectiveness of the proposed method in terms of both detection performance and minimal added computational burden. The third and fourth contributions relate to the incorporation of attention mechanisms in deep learning architectures for the tasks of 3D semantic segmentation and point cloud change detection, respectively, both of which facilitate the 3D perception of a driving scene. 3D semantic segmentation is a key element in autonomous vehicles. For such applications, 3D data are usually acquired by LiDAR sensors resulting in a point cloud. For the task of 3D semantic segmentation where the corresponding point clouds should be labeled with semantics, the current tendency is the use of deep learning neural network architectures for effective representation learning. On the other hand, various 2D and 3D computer vision tasks have used attention mechanisms which result in an effective re-weighting of the already learned features. In this dissertation, the role of attention mechanisms for the task of 3D semantic segmentation for autonomous driving is investigated, by identifying the significance of different attention mechanisms when adopted in existing deep learning networks. An extensive experimentation is conducted on two standard datasets for autonomous driving, namely Street3D and SemanticKITTI, that permit to draw conclusions at both a quantitative and qualitative level. The experimental findings show that there is a clear advantage when attention mechanisms have been adopted, resulting in a superior performance. In particular, it is shown that the adoption of a Point Transformer in the SPVCNN deep learning network, results in an architecture which outperforms the state of the art on the Street3D dataset. Point cloud change detection is used for the detection and classification of spatial changes between two LiDAR point clouds, captured at the same driving scene at different times. The current tendency is to use deep learning networks, with a siamese structure, to process the two point clouds. In this dissertation, two distinct approaches are examined for the task of point cloud change detection. In the first approach, an attention mechanism is proposed and integrated in a deep learning network, to extract useful geometric and contextual information. In the second approach, a deep learning architecture is proposed, namely SiamVFE, that relies upon a computationally efficient backbone network from a LiDAR-based 3D object detector, for feature extraction. The experimental results from both real and synthetic datasets, demonstrate that the adoption of the proposed attention mechanism improves the performance compared to the baseline method. The proposed SiamVFE demonstrates an inferior performance, as expected, but facilitates a faster inference time which is more suitable for real time applications in autonomous driving.
Institution and School/Department of submitter: Δημοκρίτειο Πανεπιστήμιο Θράκης. Πολυτεχνική Σχολή. Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών
Subject classification: Image processing--Digital techniques--Software
Keywords: 3Δ εντοπισμός αντικειμένων,Τεχνική LiDAR,Εντοπισμός αλλαγής νέφους σημείων,3D object detection,LiDAR technique,LIght Detection And Ranging technique,Point cloud change detection
URI: https://repo.lib.duth.gr/jspui/handle/123456789/20088
Appears in Collections:ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ & ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

Files in This Item:
File Description SizeFormat 
ZamanakosG_2024.pdfΔιδακτορική διατριβή24.05 MBAdobe PDFView/Open


 Please use this identifier to cite or link to this item:
https://repo.lib.duth.gr/jspui/handle/123456789/20088
http://dx.doi.org/10.26257/heal.duth.18777
  This item is a favorite for 0 people.

This item is licensed under a Creative Commons License Creative Commons