Daoying Lin

Graduation Semester and Year




Document Type


Degree Name

Doctor of Philosophy in Mathematics



First Advisor

Yan Li


With the advances in human genome research, it is now believed that the risks of many complex diseases are triggered by the interplay of genetic susceptibilities and environmental exposures. The population-based case-control study (PBCCS) is widely used to investigate the role of genetic variants and environmental exposures in the etiology of complex diseases. There are numerous ways to implement the selection process of cases and controls. In its simplest form, a simple random sampling (SRS) design is used to choose cases and controls from diseased and disease-free population, respectively. Though SRS is easy to conduct and relevant statistical methodologies are well developed, more sophisticated complex sampling (like stratified, clustered, and multistage sampling) for the selection of cases and/or controls are needed for a number of reasons. First, complex sampling is more time and cost efficient than SRS. Second, representative sample can be chosen by conducting complex sampling and thusbiased selection of cases and/or controls could be avoided. As a result,complex sampling is now being used increasingly in large-scale population-based case-control or cross-sectional genetic association studies.The analysis of complex sampling data, however, requires special attention due to the following reasons. First, varying selection probabilities as well as adjustments for nonresponse and incomplete coverage of the population at risk result in differential population weight for each individual. Secondly, multistage clustered sampling design will induce non-negligible intra-cluster correlation. It has been well recognized that invalid inferences can be drawn if we ignore these two complications. There are very limited literature regarding PBCCS with complex sampling. Therefore there is a need to develop statistical methods for properly addressing those complication induced by complex sampling in genetic association studies.In this dissertation, we propose a series of innovative statistical methods for genetic association studies that account for various sampling designs. Robust variance estimators have been developed using the Taylor Linearization technique to incorporate differential weighting and clustering effect. Monte-Carlo simulation studies are utilized to study the properties of the proposed estimators under various sampling designs. The application of the proposed methods is also illustrated using the U.S. Kidney Cancer Study (USKCS), which is one of the largest PBCSS with genome available so far.


Mathematics | Physical Sciences and Mathematics


Degree granted by The University of Texas at Arlington

Included in

Mathematics Commons