Graduation Semester and Year
2022
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Junzhou Huang
Second Advisor
Dajiang Zhu
Third Advisor
Jia Rao
Fourth Advisor
Chengkai Li
Abstract
I present my work towards solving the fundamental, challenging, and valuable problem for protein property and structure prediction. Specifically, I focus on solving the problem from three critical aspects: (1) designing powerful deep learning networks for specific protein structure property prediction tasks; (2) proposing general methods that enhancing the protein sequence homologous feature, which is an important input feature of relevant tasks; (3) developing a self-supervised pre-training model for learning structure embeddings from protein tertiary structures. To evaluate the effectiveness of the developed methods, I apply several protein downstream tasks including protein secondary structure, solvent accessibility, backbone dihedral angles, protein structure quality assessment, and protein-protein interaction site prediction. I accomplish my work step by step. Firstly, I start from the protein secondary structure prediction task, and constantly attempt and design different deep learning networks according to the characteristics of specific prediction tasks to learn the protein data representation. In order to learn the powerful representation of protein data and utilize the characteristics of protein secondary structure, I propose an EnsembleASP method, which is protein ensemble learning with Atrous Spatial Pyramid networks for secondary structure prediction. Moreover, since the homologous information of some proteins is insufficient, I propose a Bagging method which targets at improving the performance of low-quality data in the prediction task. In addition, in order to further solve the problem of uneven distribution of the homologous information in the data, as well as facilitate scientists and researchers to quickly apply and experiment on existing models, I propose a plug-and-play method, WeightAln, which is developed based on the attention mechanism. WeightAln learns the weight of the homologous feature of a target protein, and applies it in the calculation process to obtain a stronger sequence homologous information of the target protein. Last but not least, in order to help protein structure-related downstream tasks, I propose a pre-training model for learning structure embeddings from protein tertiary structures. The model is optimized with a self-supervised loss function, which only relies on protein structures and does not require any additional supervision.
Keywords
Deep learning, Protein structure property prediction, Unsupervised learning, Self-supervised pre-training
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Guo, Yuzhi, "DEEP LEARNING FOR PROTEIN PROPERTY AND STRUCTURE PREDICTION" (2022). Computer Science and Engineering Dissertations. 264.
https://mavmatrix.uta.edu/cse_dissertations/264
Comments
Degree granted by The University of Texas at Arlington