Author

Rohit Rawat

ORCID Identifier(s)

0000-0002-2039-2713

Graduation Semester and Year

2016

Language

English

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Electrical Engineering

Department

Electrical Engineering

First Advisor

Michael T Manry

Second Advisor

Ioannis D Schizas

Abstract

The piecewise linear orthonormal floating search (PLOFS) is a wrapper method for feature selection that uses a piecewise linear network (PLN) to evaluate candidate subsets. PLOFS has difficulty working on high dimensional data due to overfitting and poor clustering in the PLN subset evaluation function (SEF), and high computational complexity. The presence of noise features aggravates these problems. In order to improve upon the SEF used by PLOFS we mapped the PLN to a SPLN. Then a second order embedded feature selection was used to generate improved distance measure weights. Next, a second order method for positioning center vectors was developed. The distance measure weights and improved center vectors are mapped back to the PLN, resulting in improved performance. We analyze the behavior of noise and dependent features in OLS and use the results to develop a reliable method of eliminating these useless features, thereby extending PLOFS to problems with larger numbers of features. We augment the data with artificial random features as probes and use piecewise linear sequential forward search to identify the useless features and remove them from the data. A two-stage feature selection method which builds upon the basic PLOFS algorithm has been developed which removes useless features and then generates subsets of different sizes of the remaining features using floating search. The resulting Extended PLOFS (EPLOFS) algorithm helps eliminate the ill-effects of too many useless features in the final piecewise linear model allowing it to be applicable to larger datasets. We have evaluated EPLOFS and compared its performance to those of several other feature selection methods. In the presence of a large number of noise features, EPLOFS consistently produced the optimal subset with only the useful features and no noise features. Subsets of various sizes produced by EPLOFS often have smaller testing errors compared to subsets of the same size produced by other methods. The presence of dependent features further deteriorated performance of filter methods while the performance of EPLOFS remained largely unaffected.

Keywords

Feature selection, Floating search, Piecewise linear network, Useless features

Disciplines

Electrical and Computer Engineering | Engineering

Comments

Degree granted by The University of Texas at Arlington

Share

COinS