De Wang

Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Heng Huang


To unleash the power of big data, efficient algorithms which are scalable to millions of data are desired. Deep learning is one area that benefits from big data enormously. Deep learning uses neural networks to mimic human brains, this approach is termed connectionist in AI community. In this dissertation, we propose several novel learning strategies to improve the performance of connectionist models. Evaluation of a large neural network during inference phase requires a lot of GPU memory and computation, which will degrade user experience due to response latency. Model distillation is one way to distill the knowledge contained in one cumbersome model to a smaller one, which imitates the way that human learning is guided by teachers. We propose darker knowledge: a new method of knowledge distillation via rich targets regression. The proposed method outperforms current state-of-the-art model distillation methods proposed by Hinton et. al. A lot of high level machine learning tasks depend on model distillation, such as knowledge transfer between different neural network architectures, black box attack and defense in computer security, policy distillation in reinforcement learning, etc. Those tasks would benefit a lot from the improved model distillation method. In another work, we design a new deep neural network architecture, which enables model ensemble in a single network. The network is composed of many columns, where each column is a small computational graph that performs a series of non-linear transformation. We train multi-column branching neural networks by stochastically dropping off many columns to prevent co-adaption of columns from overfitting, and promote each column to learn different features which will enhance the aggregated representation. The new architecture exhibits ensemble property in one single model and improves the classification performance of a single neural network upon current state-of-the-art architecture. On the other hand, we studied the vulnerability of modern deep learning systems, both at the training stage and evaluation stage. At the training stage, it is possible that the training data can be contaminated by attackers with noise. This is deteriorate the recognition performance of deep learning models. We propose a new loss function that is more robust to noise input, and outperforms standard practice of neural network training. At the evaluation stage, we show that even though neural networks can achieve unprecedented high recognition accuracy on image recognition tasks, but the models are vulnerable to access attacks where attackers can generate fake identity proof easily by exploiting deployed neural networks. We show that what neural networks learn is very different from human’s vision system. Given a trained model, we can easily generate an image that will be classified into a target class with almost 100% confidence, while the image might even look like white noise to human eyes.


Deep learning, Neural networks, Knowledge distillation, Neural network architecture, Security


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington