Machine Learning Framework for Nonlinear and Interaction Relationships Involving Categorical and Numerical Features
Degree granted by The University of Texas at Arlington
Abstract
Traditionally, physical scientific experiments have been conducted extensively to study and understand the behavior of a process or a system. With the advancement of computing technology in recent years, computer codes and algorithms are used as simulators to replicate behavior of a complex system. Such use of computers to study a system is termed as ‘computer experiments.’ The process involves selecting specific points or runs in the design space in order to maximize information about the system in minimal runs. These computer models are high dimensional and can take a long time to simulate. Metamodels (or surrogate models) built using the data collected from computer model experiments are hence used to approximate the functional relationship between inputs and outputs. The contribution of this dissertation falls in design points selection and modeling stages of the above process. First, existing computer experiments with mixed factors (categorical and numerical) are reviewed and then we perform a comprehensive study of these designs to understand their performance under various settings. In the latter part of the thesis, we propose a data-mining framework to learn and model interactions and non-linearity with categorical and numerical features.