Mechanical and Aerospace Engineering Theses

OBJECT CLASSICATION, DETECTION AND STATE ESTIMATION USING YOLO V3 DEEP NEURAL NETWORK AND SENSOR FUSION OF STEREO CAMERA AND LIDAR

Kamalkumar Bharatkumar Mehta

ORCID Identifier(s)

0000-0002-2422-0230

Graduation Semester and Year

2021

Language

English

Document Type

Thesis

Degree Name

Master of Science in Aerospace Engineering

Department

Mechanical and Aerospace Engineering

First Advisor

Kamesh Subbarao

Abstract

Real-time object classification, localization, and detection with region-based convolution neural network (R-CNN) require high computational power, or it consumes a tremendous amount of time with the use of the available onboard computer system. Therefore, either of those ways is not practical in real-time object detection. In contrast, the YOLO-v3 network, which stands for you only look once, that uses the YOLO algorithm seems to be practical in live object classification, localization, and detection, since the YOLO algorithm seems to work faster than the sliding window algorithm used by R-CNN. Sensor fusion in this context requires estimation and association of information coming from more than one optical sensor that improves the accuracy in locating the object in the real world. The goal of this research is to implement a real-time object detection and tracking algorithm for a collision avoidance application in an autonomous ground vehicle. Transfer learning was performed to train the pre-trained artificial neural network (YOLO-v3) that uses a supervised learning algorithm, and a labeled dataset was created to train the deep neural network. Hyper-parameters such as mini-batch size, L2 regularization parameter, number of epochs, and learning rate were tuned to study the trade-off between the bias and variance of a classifier to improve the performance of the neural network. Moreover, the intrinsic calibration of the stereo camera and LiDAR was performed to find standard measurement error by re-projecting the measurement data on ground truth data. Similarly, the extrinsic calibration was done to derive standard transformation error between these two optical sensors. Finally, the object's states were fed to the centralized sensor fusion framework, which uses a linear Kalman filter, to estimate the states of the objects. The complete framework was developed in MATLAB 2020b by use of deep learning, computer vision and image processing, LiDAR, and sensor fusion and tracking toolboxes. The experimental platform was equipped with two single pin-hole cameras with Sony IMX 179 optical sensor, Intel Realsense L515 LiDAR camera, and Intel NUC as an on-board processing unit. Four 3d shapes (cylinder, cone, pyramid, and rectangular parallelopiped) were used to train the neural network for object detection. The driving scenario designer tool from MATLAB 2020b was used to simulate a driving scene and further use to test the sensor fusion framework.

Keywords

YOLO-v3, Object detection, Object position estimation, Lidar, Stereo vision, Sensor fusion

Disciplines

Aerospace Engineering | Engineering | Mechanical Engineering

License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Comments

Degree granted by The University of Texas at Arlington

Recommended Citation

Mehta, Kamalkumar Bharatkumar, "OBJECT CLASSICATION, DETECTION AND STATE ESTIMATION USING YOLO V3 DEEP NEURAL NETWORK AND SENSOR FUSION OF STEREO CAMERA AND LIDAR" (2021). Mechanical and Aerospace Engineering Theses. 791.
https://mavmatrix.uta.edu/mechaerospace_theses/791

29983-2.zip (106281 kB)

Download

Included in

Aerospace Engineering Commons, Mechanical Engineering Commons

COinS