题目：ASURVEYOFSOME “SPARSE” METHODSFORHIGHDIMENSIONALDATA
High dimensional data means that the number of variablespis far larger than the number of observationsn. This occurs in several fields such as genomic data or chemometrics.
Whenp>nthe OLS estimator does not exist for linear regression. Since it is a case of forced multicollinearity, one may use regularized techniques such as ridge regression, principal component regression or PLS regression: these methods provide rather robust estimates through a dimension reduction approach or with explicit (or not) constraints on the regression coefficients. The fact that all the predictors are kept is often considered as a positive point.
However, ifp>>nit becomes a drawback since a combination of all variables cannot be interpreted. Sparse combinations,i.e.with a large number of zero coefficients are preferred. The Lasso consists in finding the estimate, which performs simultaneously regularization and variable selection thanks to a L1penalty. We will present variants such as sparse PLS and the group-lasso when the variables are structured in blocks.