-
Notifications
You must be signed in to change notification settings - Fork 13
Diagnostic statistics and visualization for quantile regression
Quantile regression models provide fitted values for the response that can be any quantile, rather than the mean value. They are widely used in population studies, and process control. It was proposed in a seminal paper by Koenker and Basset (1978). Many new approaches have been developed in recent years, and there are several R packages available: quantreg, gbm, quantregForest, qrnn, ALDqr and bayesQR.
A review paper by Koenker (2017) recommends that more diagnostics should be provided for modeling. This is the purpose of this project, to provide more diagnostic methods for quantile models, particularly focusing on models available in current R packages.
One of the benefits of quantile models, is that they tend to be more robust to outliers that traditional methods like the least squares approach to regression fitting. This is because only the observations within a band of the quantile to be estimated are used to fit the model. However, there are no diagnostics to actually assess the effect of outliers of the model fitting, and how they might affect some quantiles more than others.
The R package, quokar, available on CRAN, makes a start on this. It provides diagnostics using L1 estimation. L1-quantile estimation tends to resists well vertical outliers (outliers in the y-space), but are still sensitive to data points outlying in x-space with breakdown point which equals 0. A more robust method was proopsod in Rousseeuw and Huber (1999), based on the concept of regression depth.
Related literature includes:
Koenker, Roger. "Quantile regression: 40 years on." Annual Review of Economics 9 (2017): 155-176.
Rousseeuw, Peter J., and Mia Hubert. "Regression depth." Journal of the American Statistical Association 94.446 (1999): 388-402.
Hallin, Marc, et al. "Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth." The Annals of Statistics (2010): 635-703.
Mizera, Ivan, and Milos Volauf. "Continuity of halfspace depth contours and maximum depth estimators: diagnostics of depth-related methods." Journal of Multivariate Analysis 83.2 (2002): 365-388.
Van Aelst, Stefan, and Peter J. Rousseeuw. "Robustness of deepest regression." Journal of Multivariate Analysis 73.1 (2000): 82-106.
This project will extend the quokar package, in the following ways:
- to calculate robust estimator for quantile regression based on regression depth
- develop visualization methods to display outlier diagnostics
- write vignettes to document usage.
The extended package quokar will provide diagnostics to help with each of the existing modeling packages. It will provide new ways to do sensitivity analysis with quantile regression.
Students, please contact mentors below after completing at least one of the tests below.
- Dianne Cook [email protected], author of GGally, nullabor, tourr
- Kris Boudt [email protected]
Students, please do one or more of the following tests before contacting the mentors above.
- (Easy) Write code to plot the points tagged as being outlying by the current diagnostic methods in quokar, using ah or his data provided with the package.
- (Medium) Make plots illustrating the simplex algorithm or interior point algorithm for estimating linear quantile regression parameters.
- (Hard) Make plots illustrating the interior point algorithm for estimating non-linear quantile regression parameters.
- Wenjing Wang