Skip to content

Commit

Permalink
render .md
Browse files Browse the repository at this point in the history
  • Loading branch information
rempsyc committed Oct 4, 2023
1 parent 3558a62 commit 46e8575
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 25 deletions.
40 changes: 20 additions & 20 deletions papers/JOSE/paper.log
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
This is XeTeX, Version 3.141592653-2.6-0.999995 (TeX Live 2023) (preloaded format=xelatex 2023.9.14) 3 OCT 2023 12:56
This is XeTeX, Version 3.141592653-2.6-0.999995 (TeX Live 2023) (preloaded format=xelatex 2023.9.14) 4 OCT 2023 11:22
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
Expand Down Expand Up @@ -1066,14 +1066,14 @@ Package fancyhdr Warning: \headheight is too small (62.59596pt):

[2]
LaTeX Font Info: Font shape `TU/lmtt/bx/n' in size <10> not available
(Font) Font shape `TU/lmtt/b/n' tried instead on input line 458.
(Font) Font shape `TU/lmtt/b/n' tried instead on input line 477.

Overfull \hbox (32.66139pt too wide) in paragraph at lines 474--474
Overfull \hbox (32.66139pt too wide) in paragraph at lines 493--493
[]\TU/lmtt/m/n/10 #> -----------------------------------------------------------------------------[]
[]


Overfull \hbox (32.66139pt too wide) in paragraph at lines 483--483
Overfull \hbox (32.66139pt too wide) in paragraph at lines 502--502
[]\TU/lmtt/m/n/10 #> -----------------------------------------------------------------------------[]
[]

Expand All @@ -1097,8 +1097,8 @@ Package fancyhdr Warning: \headheight is too small (62.59596pt):
(fancyhdr) \addtolength{\topmargin}{-1.71957pt}.

[4]
File: paper_files/figure-latex/model-1.pdf Graphic file (type pdf)
<use paper_files/figure-latex/model-1.pdf>
File: paper_files/figure-latex/model_fig-1.pdf Graphic file (type pdf)
<use paper_files/figure-latex/model_fig-1.pdf>
File: D:/Rpackages/rticles/rmarkdown/templates/joss/resources/JOSE-logo.png Graphic file (type bmp)
<D:/Rpackages/rticles/rmarkdown/templates/joss/resources/JOSE-logo.png>

Expand Down Expand Up @@ -1129,17 +1129,27 @@ Package fancyhdr Warning: \headheight is too small (62.59596pt):
(fancyhdr) \addtolength{\topmargin}{-1.71957pt}.

[7]
Underfull \hbox (badness 1584) in paragraph at lines 926--932
File: D:/Rpackages/rticles/rmarkdown/templates/joss/resources/JOSE-logo.png Graphic file (type bmp)
<D:/Rpackages/rticles/rmarkdown/templates/joss/resources/JOSE-logo.png>

Package fancyhdr Warning: \headheight is too small (62.59596pt):
(fancyhdr) Make it at least 64.31554pt, for example:
(fancyhdr) \setlength{\headheight}{64.31554pt}.
(fancyhdr) You might also make \topmargin smaller to compensate:
(fancyhdr) \addtolength{\topmargin}{-1.71957pt}.

[8]
Underfull \hbox (badness 1584) in paragraph at lines 959--965
[]\TU/lmr/m/n/10 Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psy-
[]


Underfull \hbox (badness 3049) in paragraph at lines 926--932
Underfull \hbox (badness 3049) in paragraph at lines 959--965
\TU/lmr/m/n/10 chology: Undisclosed flexibility in data collection and analysis allows pre-
[]


Underfull \hbox (badness 3735) in paragraph at lines 926--932
Underfull \hbox (badness 3735) in paragraph at lines 959--965
\TU/lmr/m/n/10 senting anything as significant. \TU/lmr/m/it/10 Psychological Science\TU/lmr/m/n/10 , \TU/lmr/m/it/10 22\TU/lmr/m/n/10 (11), 1359–1366.
[]

Expand All @@ -1152,16 +1162,6 @@ Package fancyhdr Warning: \headheight is too small (62.59596pt):
(fancyhdr) You might also make \topmargin smaller to compensate:
(fancyhdr) \addtolength{\topmargin}{-1.71957pt}.

[8]
File: D:/Rpackages/rticles/rmarkdown/templates/joss/resources/JOSE-logo.png Graphic file (type bmp)
<D:/Rpackages/rticles/rmarkdown/templates/joss/resources/JOSE-logo.png>

Package fancyhdr Warning: \headheight is too small (62.59596pt):
(fancyhdr) Make it at least 64.31554pt, for example:
(fancyhdr) \setlength{\headheight}{64.31554pt}.
(fancyhdr) You might also make \topmargin smaller to compensate:
(fancyhdr) \addtolength{\topmargin}{-1.71957pt}.

[9] (./paper.aux)
***********
LaTeX2e <2023-06-01> patch level 1
Expand All @@ -1175,7 +1175,7 @@ Package logreq Info: Writing requests to 'paper.run.xml'.
)
Here is how much of TeX's memory you used:
36963 strings out of 477589
757118 string characters out of 5817003
757166 string characters out of 5817003
1940432 words of memory out of 5000000
57602 multiletter control sequences out of 15000+600000
564981 words of font info for 89 fonts, out of 8000000 for 9000
Expand Down
21 changes: 16 additions & 5 deletions papers/JOSE/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,11 @@ Importantly, whatever approach researchers choose remains a subjective decision,

Researchers frequently attempt to identify outliers using measures of deviation from the center of a variable's distribution. One of the most popular such procedure is the _z_ score transformation, which computes the distance in standard deviation (SD) from the mean. However, as mentioned earlier, this popular method is not robust. Therefore, for univariate outliers, it is recommended to use the median along with the Median Absolute Deviation (MAD), which are more robust than the interquartile range or the mean and its standard deviation [@leys2019outliers; @leys2013outliers].

Researchers can identify outliers based on robust (i.e., MAD-based) _z_ scores using the `check_outliers()` function of the *{performance}* package, by specifying `method = "zscore_robust"`.^[Note that `check_outliers()` only checks numeric variables.] Although @leys2013outliers suggest a default threshold of 2.5 and @leys2019outliers a threshold of 3, *{performance}* uses by default a less conservative threshold of ~3.29.^[3.29 is an approximation of the two-tailed critical value for _p_ < .001, obtained through `qnorm(p = 1 - 0.001 / 2)`. We chose this threshold for consistency with the thresholds of all our other methods.] That is, data points will be flagged as outliers if they go beyond +/- ~3.29 MAD. Users can adjust this threshold using the `threshold` argument, as demonstrated below.
Researchers can identify outliers based on robust (i.e., MAD-based) _z_ scores using the `check_outliers()` function of the *{performance}* package, by specifying `method = "zscore_robust"`.^[Note that `check_outliers()` only checks numeric variables.] Although @leys2013outliers suggest a default threshold of 2.5 and @leys2019outliers a threshold of 3, *{performance}* uses by default a less conservative threshold of ~3.29.^[3.29 is an approximation of the two-tailed critical value for _p_ < .001, obtained through `qnorm(p = 1 - 0.001 / 2)`. We chose this threshold for consistency with the thresholds of all our other methods.] That is, data points will be flagged as outliers if they go beyond +/- ~3.29 MAD. Users can adjust this threshold using the `threshold` argument.

Below we provide example code using the `mtcars` dataset, which was extracted from the 1974 *Motor Trend* US magazine. The dataset contains fuel consumption and 10 characteristics of automobile design and performance for 32 different car models (see `?mtcars` for details). We chose this dataset because it is accessible from base R and familiar to many R users. We might want to conduct specific statistical analyses on this data set, say, _t_ tests or structural equation modelling, but first, we want to check for outliers that may influence those test results.

Because the automobile names are stored as column names in `mtcars`, we first have to convert them to an ID column to benefit from the `check_outliers()` ID argument. Furthermore, we only really need a couple columns for this demonstration, so we choose the first four (`mpg` = Miles/(US) gallon; `cyl` = Number of cylinders; `disp` = Displacement; `hp` = Gross horsepower). Finally, because there are no outliers in this dataset, we add two artificial outliers before running our function.


```r
Expand Down Expand Up @@ -163,7 +167,9 @@ outliers
#> 34 34 34 16.52502
```

The row numbers of the detected outliers can be obtained by using `which()` on the output object, which can be used for exclusions for example:
What we see is that `check_outliers()` with the robust _z_ score method detected two outliers: cases 33 and 34, which were the observations we added ourselves. They were flagged for two variables specifically: `mpg` (Miles/(US) gallon) and `cyl` (Number of cylinders), and the output provides their exact _z_ score for those variables.

We describe how to deal with those cases in more details later in the paper, but should we want to exclude these detected outliers from the main dataset, we can extract row numbers using `which()` on the output object, which can then be used for indexing:


```r
Expand All @@ -178,8 +184,6 @@ which(outliers)
data_clean <- data[-which(outliers), ]
```

All `check_outliers()` output objects possess a `plot()` method, meaning it is also possible to visualize the outliers using the generic `plot()` function on the resulting outlier object after loading the {see} package.

Other univariate methods are available, such as using the interquartile range (IQR), or based on different intervals, such as the Highest Density Interval (HDI) or the Bias Corrected and Accelerated Interval (BCI). These methods are documented and described in the function's [help page](<https://easystats.github.io/performance/reference/check_outliers.html>).

## Multivariate Outliers
Expand All @@ -204,6 +208,8 @@ outliers
#> - For variables: mpg, cyl, disp, hp.
```

Here, we detected 9 multivariate outliers (i.e,. when looking at all variables of our dataset together).

Other multivariate methods are available, such as another type of robust Mahalanobis distance that in this case relies on an orthogonalized Gnanadesikan-Kettenring pairwise estimator [@gnanadesikan1972robust]. These methods are documented and described in the function's [help page](https://easystats.github.io/performance/reference/check_outliers.html).

## Model-Based Outliers
Expand All @@ -225,12 +231,17 @@ outliers
#> - For variable: (Whole model).
```

Using the model-based outlier detection method, we identified a single outlier.

All `check_outliers()` output objects possess a `plot()` method, meaning it is also possible to visualize the outliers using the generic `plot()` function on the resulting outlier object after loading the {see} package.


```r
plot(outliers)
```

\begin{figure}
\includegraphics[width=1\linewidth]{paper_files/figure-latex/model-1} \caption{Visual depiction of outliers based on Cook's distance (leverage and standardized residuals), based on the fitted model.}\label{fig:model}
\includegraphics[width=1\linewidth]{paper_files/figure-latex/model_fig-1} \caption{Visual depiction of outliers based on Cook's distance (leverage and standardized residuals), based on the fitted model.}\label{fig:model_fig}
\end{figure}

Table 1 below summarizes which methods to use in which cases, and with what threshold.
Expand Down
Binary file modified papers/JOSE/paper.pdf
Binary file not shown.
Binary file not shown.

0 comments on commit 46e8575

Please sign in to comment.