Graduation Semester and Year
2022
Language
English
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Computer Science
Department
Computer Science and Engineering
First Advisor
Junzhou Huang
Abstract
Drug discovery is the process of discovering new candidate medications. New drugs are continually developed by pharmaceutical industries to address increasing medical needs. Drug discovery involves a series of processes including target identification and validation, hit identification, lead generation and optimization, and finally the identification of a candidate for further development. The development further includes optimization of chemical synthesis and its formulation, toxicological studies in animals, clinical trials, and eventually regulatory approval. Both of these processes are time-consuming and cost-expensive. Computer-aided drug discovery mainly relies on modern computers to model drug molecules, which can speed up the process of drug discovery and reduce costs. In this dissertation, we will investigate two representative applications of drug discovery: molecule generation and retrosynthesis prediction. Since molecules can be represented as either sequences or graphs, therefore different machine learning models (sequence models and graph neural networks) can be adapted for molecular modelling. As the rapid development of machine learning, there are abundant research works try to apply machine learning models on drug discovery. However, these methods are not efficient and effective enough for real-world applications. We propose to improve the efficiency of modern machine learning models for the drug discovery applications. We will explore two representative applications of drug discovery: molecule generation and retrosynthesis prediction. Particularly, we propose new techniques to improve the current sequence models for the molecule generation and graph models for the retrosynthesis prediction, respectively. Extensive experiments prove the efficiency and effectiveness of our methods. We will first investigate variational autoencoder models for molecule sequence generation. We propose a simple and effective solution to the posterior collapse problem of variational autoencoder models. Then we will study retrosynthesis prediction, and we propose both template-free and template-based methods to overcome the disadvantages of existing methods.
Keywords
Graph neural networks, Sequence models, Molecule generation, Retrosynthesis prediction
Disciplines
Computer Sciences | Physical Sciences and Mathematics
License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Recommended Citation
Yan, Chaochao, "Effective Sequence Models and Graph Neural Networks for Molecular Data Analysis" (2022). Computer Science and Engineering Dissertations. 346.
https://mavmatrix.uta.edu/cse_dissertations/346
Comments
Degree granted by The University of Texas at Arlington