Skip to content

Latest commit

 

History

History
42 lines (21 loc) · 2.17 KB

README.md

File metadata and controls

42 lines (21 loc) · 2.17 KB

OUTLYR – Detect and treat outliers in R 🚀

What is it

This is an R function heavily rooted on dplyr grammar. It is intended to be used to check outliers' influence in your analysis. It returns a copy of your dataset with outliers treated (i.e., trimmed, replaced, or winsorized). Re-run your test using this new dataset to assess how influential outliers were. It handles multiple variables at the time (as many as listed in 'vars'). Operations are done column-wise. If the 'group' argument is defined, these will be done on each group separately.

*PLEASE NOTE* Neither removing nor changing outliers should be your first option. This is not a front-end step to clean your data. There are reasons to believe that doing so is actually counter-producent. Check Dr. Nick Holmes (@TheHandLab) thread to learn more about it: https://twitter.com/TheHandLab/status/1434097550840246279

Robust (robustbase) or non-parametric tests, or even data transformations should be prioritized.

Check LambertW and bestNormalize for more aggresive data transforms approaches if classical methods fail.

Installation

devtools::install_github('https://github.com/jottinog/outlyr')

Usage

outlyr(x, y, group, outlier, treat)

x – your full dataset.

y – character vector with names of variables from the dataset to examine.

group – (Optional) string indicating name of the grouping variable. If filled all further steps are done within-group.

outlier – (Optional) how outliers are flagged. 'z' flags values outside ± 3 SD range. 'iqr' flags values outside 1.5 * IQR range.

threshold – You can define the threshold manually if not happy with the default options (z ± 3, iqr ± 1.5 * IQR range)

treat – how treat outliers. 'trim' set them to NA. 'win' replace them by max/min. 'replace' does mean-replacement.

Example:

library(tidyverse)

data(iris)

vars <- c('Petal.Width', 'Petal.Length')  # List the variables you want to look up for.

new_iris <- outlyr(iris, vars, group = 'Species', outlier = 'iqr', treat = 'win')  # Within-group ('Species' defined in group argument). Outliers defined by IQR method and winsorized.