Graduation Semester and Year

Spring 2026

Language

English

Document Type

Thesis

Degree Name

Master of Science in Earth and Environmental Science

Department

Earth and Environmental Sciences

First Advisor

Yike Shen

Second Advisor

Yunyao Li

Third Advisor

Yue Liao

Abstract

The Texas Department of State Health Services monitors numerous notifiable conditions statewide, including Campylobacter, Salmonella, Shiga toxin-producing Escherichia coli (STEC), Rabies, and West Nile virus (WNV). Given the substantial health, economic, and public health burden associated with these conditions, improving prediction is an important step toward reducing their overall impact. This study evaluated whether external demographic, social, climate, and environmental data could improve prediction of county-year disease activity across Texas. County level data was analyzed using supervised machine learning models, including linear regression, ridge regression, multilayer perceptron, random forest, XGBoost, as well as K-means clustering to identify broader spatial-temporal patterns. Tree-based models performed best overall, with XGBoost emerging as the top-performing model for all five retained conditions. Predictive performance was strongest for Campylobacteriosis (R² = 0.774 ± 0.018), Salmonella (R² = 0.822 ± 0.018), and STEC (R² = 0.690 ± 0.030), while Rabies (R² = 0.803 ± 0.012) and West Nile Virus (R² = 0.677 ± 0.064) remained moderately to strongly predictable at the county-year level. Population was the strongest predictor across conditions, while environmental and climate-related features provided additional explanatory value, especially for West Nile Virus. This result extended to unsupervised clustering of these conditions, with a single cluster consistently dedicated to highly populated counties and additional clusters grouping geographically similar regions. These findings support integrating environmental and demographic data into infectious disease surveillance to inform more targeted public health approaches across Texas.

Keywords

Epidemiology, Machine Learning, Public Health, Salmonella, Campylobacter, Rabies, Shiga Toxin-producing E. Coli, Rabies, West Nile Virus

Disciplines

Data Science | Environmental Health | Environmental Sciences

License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.