Shuo Li

Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Computer Science


Computer Science and Engineering

First Advisor

Jean Gao


Quantitative analysis of Raman spectra using surface-enhanced Raman scattering (SERS) nanoparticles has shown the potential and promising trend of development in vivo molecular imaging. One of the key job is from the intensities of Raman signals to predict the quantities of analytes. Direct classical least squares (DCLS) and multivariate calibration (MC) are commonly used methods. DCLS relies on source Raman signals as the references. But the inherent Instability of Raman signals make the DCLS model not robust enough. MC model relies on a batch of training mixture Raman signals together with the ground truth mixing concentrations to build the multivariate multiple linear regression models, so as to reduce the bias from the instability of source Raman signals. But it also brings in the more variables than observations problem. Latent variable regression (LVR) model avoids that problem by extracting low dimensional latent variables (LVs) (or extracted features) to do regression with concentrations. Among several LVR methods, partial least squares regression (PLSR) algorithms are more robust, since their LVs both represent original Raman signals and predict concentrations. In this thesis, quantitative analysis models and methods are compared to show why PLSR algorithms are more robust for the purpose of quantitative analysis of Raman spectra.Only PLSR cannot handle the instable background of Raman signals. Baseline correction methods are commonly used as the preprocessing to find a slowly changed baseline under the signal as the estimated background. Raman peaks are extracted then by subtracting the baseline from the Raman signal. But baseline correction methods are usually time consuming iterative processes, and normally they cannot deal with the multi-scale property of Raman peaks. We designed a simple algorithm, called continuous wavelet transform (CWT) based partial least squares regression (CWT-PLSR) that uses the average CWT coefficients of mixture Raman signals to do PLSR with mixing concentrations. It extracts the multi-scale information of Raman peaks and so is more robust than traditional baseline correction methods.The features extracted by PLSR give more weights to those Raman peaks that both representing Raman signals and predicting concentrations well. But the portion of each purpose is fixed in the objective function of PLSR. To improve the flexibility of PLSR, we designed a new continuum regression (CR) method that use a tuning parameter to control the portion of each purpose in the objective function and it gives more reasonable weights to Raman peaks. It beats other two CR methods by embracing PCR, RRR and PLS as three special cases, and is simply achieved by NIPALS algorithm.Tuning parameters of PLSR and CR methods are normally decided by time-consuming cross-validation methods. And some parameters have infinite numbers of possible values in continuous ranges. There is no way to test every value by cross-validation methods. Nonparametric Bayesian models of these methods are needed to decide the parameters automatically from the training data. As a foundation work, we design a probabilistic PLS regression model to give a probabilistic view of the PLSR methods. Future Bayesian models can be achieved by adding reasonable priors of the parameters.


Computer Sciences | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington