Author

Sudheer Raja

ORCID Identifier(s)

0000-0002-5446-6661

Graduation Semester and Year

2019

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Dajiang Zhu

Abstract

Over the recent years, Deep Neural Networks (DNNs) have surpassed human-level intelligence in recognizing and interpreting complex patterns in data. Ever since the ImageNet competition in 2012, Deep Learning (DL) has become a promising approach for solving numerous problems in the field of Computer Science. However, the neuroscience community is not able to utilize the DL algorithms effectively because the brain imaging datasets are huge in terms of size, and the current sequential training techniques do not scale up well for such big datasets. Without the proper amount of training data, training DNN models to competitive accuracies is quite challenging. Even with powerful GPUs or TPUs, the training performance can still be unsatisfactory if each data sample itself is large, as in the case of the brain imaging datasets. One solution is to parallelize the training process instead of training in a sequential mini-batch fashion. However, the currently available distributed training techniques suffer from several problems like computation bottleneck and model divergence. In this thesis, we discuss a novel training technique that can overcome these problems by distributing the model training across multiple GPUs on different nodes asynchronously and updating the gradients synchronously during the backward pass (backpropagation) in a Ring manner. We explore how to build such systems and train models efficiently using model replication and data parallelism techniques with very minimal changes to the existing code. We perform a comparative performance analysis of the proposed technique, training several Convolutional Neural Network (CNN) models on single-GPU, multi-GPU systems, and a Multi-node Multi-GPU cluster. Our analysis provides conclusive support that the proposed training technique can significantly out-perform the traditional sequential training approach.

Keywords

Distributed deep learning, Deep neural networks, Brain imaging training optimization

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

28862-2.zip (1892 kB)

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.