Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Civil Engineering


Civil Engineering

First Advisor

Yu Zhang


The objective of this research is to address the limitations inherent in conventional statistical postprocessing schemes to generate probabilistic quantitative precipitation forecasts (PQPFs) and improve postprocessed PQPFs quality through introducing new robust statistical models and machine learning frameworks. This dissertation comprises three main elements. First, a new, two-part scheme for creating PQPFs from single-valued quantitative precipitation forecasts is introduced. This scheme, herein referred to as the Mixed-type Non-Homogenous Regression (MNHR), combines the use of logistic regression for estimating rainfall intermittency, and non-homogeneous regression for estimation of additional parameters of the conditional distribution. The performance of MNHR is evaluated relative to operational Mixed-type Meta-Gaussian Distribution (MMGD) and the Censored-Shifted Gamma Distribution (CSGD) in postprocessing Global Ensemble Forecast System (GEFS) reforecast averaged over 25 watersheds in the American River Basin in California. The results point to superior performance of MNHR relative to MMGD and CSGD in terms of the skill of postprocessed PQPFs at 24-h and 96-h accumulation windows. In addition, it is observed that the performance of CSGD tends to trail behind MNHR and MMGD at least for the 24-h window, though the performance differences tend to narrow at higher forecast amounts and longer lead times. The analyses suggest that CSGD’s underperformance arises partly from its tendency to inflate the shift parameter estimates, which is pronounced over the study site possibly because of infrequent rainfall occurrence. By contrast, MNHR’s use of logistic regression helps avoid such bias, and its formulation of conditional distribution addresses the lack of skewness of MMGD for higher forecast amounts. Moreover, MHNR-based PQPF exhibits both superior calibration and relatively high sharpness at short lead times and in an unconditional sense, whereas it features lower sharpness relative to the other two suites when conditioned on higher forecast amounts. Many present-day statistical postprocessing schemes rely on calibration using prescribed statistical models to relate forecast statistics to distributional parameters. The efficacy of such mechanisms is often constrained not only by prescribed predictor-predictand relation, but also by arbitrary choices of temporal window and lead time range for training. To address this limitation, later an end-to-end, computationally efficient hybrid scheme is proposed. This scheme is capable of producing full predictive distributions of precipitation accumulation without explicit stratification of forecast-observation pairs by forecast lead time and season. The proposed framework uses the CSGD as the predictive distribution but uses an artificial neural network (ANN) to estimate the distributional parameters of CSGD through a unified approach. This approach, referred to as ANN-CSGD, allows for simultaneous estimation of distributional parameters over multiple lead times and seasons in a single model by incorporating the latter variables as predictors to the ANN. The proposed model is tested for postprocessing of ensemble mean forecasts of 24-h precipitation totals over selected river basins in California, at one- to seven-day lead times, from GEFS. The PQPFs from the ANN-CSGD, are more skillful overall than those from the benchmark CSGD and the MMGD models. The ANN-CSGD PQPFs highly improve the performance of those from CSGD in predicting the probability of precipitation (PoP) and are also much sharper and reliable at higher precipitation thresholds. The hybrid approach, by using the entire available training data and its modified formulation, efficiently represents interactions between GEFS forecasts and season/lead times, thus leading to enhanced predictive performance. Finally, deep learning (DL) based, geographically aware, and computationally efficient schemes are proposed. These schemes are able to create postprocessed PQPFs over contiguous United States (CONUS) using a short (60-day) training data. Proposed methods aim to address the limited ability of conventional techniques, that is, they offer limited ability to improve the skills of probabilistic guidance of precipitation forecasts, especially when only short training datasets are available for performing calibration. The efficacy of these schemes is demonstrated through a set of hindcast experiments wherein postprocessed 24-h PQPFs for lead times ranging from one- to eight- days from the GEFS are generated. As the benchmark, the quantile mapping stencil algorithm by Hamill et al. (2017) is used. The latter benchmark model is used in the US National Blend of Models program. The results show that skills of PQPFs from DL based approaches outperform those from benchmark as well as raw forecast over the CONUS in predicting PoP and over a range of larger thresholds, and the outperformance is more evident at higher thresholds. DLs considerable success are mainly attributed to their ability in efficiently modeling complex-arbitrary nonlinear predictor-predictand relationships, their ability to create spatially aware forecasts achieved by incorporating geographic information to networks and using state of the art training schemes that makes it easier to reduce generalization errors.


Ensembles, Forecast verification/skill, Forecasting, Probability forecasts/models/distribution, Short-range prediction, Statistical forecasting, Artificial neural networks, Probabilistic quantitative precipitation forecast


Civil and Environmental Engineering | Civil Engineering | Engineering


Degree granted by The University of Texas at Arlington