MARS Knot Positioning and Global Optimization

Xinglong Ju

Degree granted by The University of Texas at Arlington

Abstract

Multivariate adaptive regression splines (MARS) is a statistical modeling approach with wide real-world applications. In the MARS model building process, knot positioning is a critical step that potentially affects the accuracy of the final MARS model. Identifying well-positioned knots entails assessing the quality of many knots in each model building iteration, which requires much computation efforts. By exploring the change in the residual sum of squares (RSS) within MARS, we find that local optima from previous iterations can be very close to those of the current iteration. In our approach, the prior change in RSS information is used to “warm start” an optimal knot positioning. We propose two methods for MARS knot positioning. The first method is a hill climbing method (HCM), which ignores prior change in RSS information. The second method is a hill climbing method using prior change in RSS information (PHCM). Numerical experiments are conducted on data with up to 30 dimensions. Our results show that both versions of hill climbing methods outperform Chen's MARS knot selection method on datasets with different noise levels. Further, PHCM using prior change in RSS information performs best in both accuracy and computational speed. Multivariate adaptive regression splines (MARS) is a flexible statistical modeling method that has been popular for data mining applications. MARS has also been employed to approximate unknown relationships in optimization for complex systems, including surrogate optimization, dynamic programming, and two-stage stochastic programming. Given the increasing desire to optimize real world systems, this paper presents an approach to globally optimize a MARS model that allows up to two-way interaction terms that are products of truncated linear univariate functions (TITL-MARS). Specifically, such a MARS model consists of linear and quadratic structure. This structure is exploited to formulate a mixed integer quadratic programming problem (TITL-MARS-OPT). To appreciate the contribution of TITL-MARS-OPT, one must recognize that popular heuristic optimization approaches, such as evolutionary algorithms, do not guarantee global optimality and can be computationally slow. The use of MARS maintains the flexibility of modeling within TITL-MARS-OPT while also taking advantage of the linear modeling structure of MARS to enable global optimality. Computational results compare TITL-MARS-OPT with a genetic algorithm for two types of cases. First, a wind farm power distribution case study is described and then other TITL-MARS forms are tested. The results show the superiority of TITL-MARS-OPT over the genetic algorithm in both accuracy and computational time.