
High-dimensional low sample size (HDLSS) data analysis have been popular nowadays in statistical machine learning fields. Such applications involves a huge number of features or variables, but sample size is limited due to reasons such as cost, ethnicity and etc. It is important to find approaches to learn the underlying relationships via a small fraction of data. In this dissertation, we study the statistical properties for some non-parametric machine learning models that deal with these problems and apply these models to various fields for validation. In Chapter 2, we study the generalized additive model in the high-dimensional set up with a general link function that belong to the exponential family. We apply a two-step approach to do variable selection and estimation: a group lasso step as an initial estimator, then followed by a adaptive group lasso step to obtain final variables and estimations. We show that under certain conditions, the two-step approach consistently selects the truly nonzero variables and derived the estimation rate of convergence. Moreover, we show that the tuning parameter that minimizes the generalized information criterion (GIC) has asymptotically minimum risk. Simulations in variable selection and estimation are given. Real data examples including spam email and prostate cancer genetic data are also used to support the theory. Moreover, we discussed the possibility of using a l0 norm penalty. In Chapter 3, we study a shallow neural network model in the high-dimensional classification set up. The sparse group lasso, also known as the lp,1 + l1 norm penalty, is applied to obtain feature sparsity and a sparse network structure. Neural networks can be used to approximate any continuous function with an arbitrary small approximation error given that the number of hidden nodes is large enough, which is known as the universal approximation theorem. Therefore, neural networks are used to model complicated relationships between the response and
Page Count:
245
Publication Date:
2020-01-01
Publisher:
Michigan State University. Statistics
ISBN-13:
9798635297742
No comments yet. Be the first to share your thoughts!