Author

Yuzhi Guo

ORCID Identifier(s)

0000-0002-8993-1818

Graduation Semester and Year

2022

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Computer Science and Engineering

First Advisor

Junzhou Huang

Second Advisor

Dajiang Zhu

Third Advisor

Jia Rao

Fourth Advisor

Chengkai Li

Abstract

I present my work towards solving the fundamental, challenging, and valuable problem for protein property and structure prediction. Specifically, I focus on solving the problem from three critical aspects: (1) designing powerful deep learning networks for specific protein structure property prediction tasks; (2) proposing general methods that enhancing the protein sequence homologous feature, which is an important input feature of relevant tasks; (3) developing a self-supervised pre-training model for learning structure embeddings from protein tertiary structures. To evaluate the effectiveness of the developed methods, I apply several protein downstream tasks including protein secondary structure, solvent accessibility, backbone dihedral angles, protein structure quality assessment, and protein-protein interaction site prediction. I accomplish my work step by step. Firstly, I start from the protein secondary structure prediction task, and constantly attempt and design different deep learning networks according to the characteristics of specific prediction tasks to learn the protein data representation. In order to learn the powerful representation of protein data and utilize the characteristics of protein secondary structure, I propose an EnsembleASP method, which is protein ensemble learning with Atrous Spatial Pyramid networks for secondary structure prediction. Moreover, since the homologous information of some proteins is insufficient, I propose a Bagging method which targets at improving the performance of low-quality data in the prediction task. In addition, in order to further solve the problem of uneven distribution of the homologous information in the data, as well as facilitate scientists and researchers to quickly apply and experiment on existing models, I propose a plug-and-play method, WeightAln, which is developed based on the attention mechanism. WeightAln learns the weight of the homologous feature of a target protein, and applies it in the calculation process to obtain a stronger sequence homologous information of the target protein. Last but not least, in order to help protein structure-related downstream tasks, I propose a pre-training model for learning structure embeddings from protein tertiary structures. The model is optimized with a self-supervised loss function, which only relies on protein structures and does not require any additional supervision.

Keywords

Deep learning, Protein structure property prediction, Unsupervised learning, Self-supervised pre-training

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS