Adds data argument to effectsize.htest/cohens_d #522

rempsyc · 2022-10-16T23:08:31Z

# Load effectsize
devtools::load_all(".../forks/effectsize")

# "Right" method
y <- t.test(mtcars$mpg ~ mtcars$vs)
cohens_d(y)
#> Cohen's d |         95% CI
#> --------------------------
#> -1.70     | [-2.55, -0.82]
#> 
#> - Estimated using un-pooled SD.

# Approximation method
x <- t.test(mpg ~ vs, data = mtcars)
cohens_d(x)
#> Warning: Unable to retrieve data from htest object. Returning an approximate
#>   effect size using t_to_d().
#> d     |         95% CI
#> ----------------------
#> -1.96 | [-2.94, -0.95]

# New method with data specified
cohens_d(x, data = mtcars)
#> Cohen's d |         95% CI
#> --------------------------
#> -1.70     | [-2.55, -0.82]
#> 
#> - Estimated using un-pooled SD.

# Comparison
cohens_d(y) == cohens_d(x, data = mtcars)
#>                                                 Cohens_d   CI CI_low CI_high
#> difference in means between group 0 and group 1     TRUE TRUE   TRUE    TRUE

# Approximation method
report::report_table(x)
#> Warning: Unable to retrieve data from htest object. Returning an approximate
#>   effect size using t_to_d().
#> Welch Two Sample t-test
#> 
#> Parameter | Group | Mean_Group1 | Mean_Group2 | Difference |          95% CI | t(22.72) |      p |     d |          d  CI
#> -------------------------------------------------------------------------------------------------------------------------
#> mpg       |    vs |       16.62 |       24.56 |      -7.94 | [-11.46, -4.42] |    -4.67 | < .001 | -1.96 | [-2.94, -0.95]
#> 
#> Alternative hypothesis: two.sided

# New method with data specified
report::report_table(x, data = mtcars)
#> Welch Two Sample t-test
#> 
#> Parameter | Group | Mean_Group1 | Mean_Group2 | Difference |          95% CI | t(22.72) |      p | Cohen's d |   Cohens_d  CI
#> -----------------------------------------------------------------------------------------------------------------------------
#> mpg       |    vs |       16.62 |       24.56 |      -7.94 | [-11.46, -4.42] |    -4.67 | < .001 |     -1.70 | [-2.55, -0.82]
#> 
#> Alternative hypothesis: two.sided

^{Created on 2022-10-16 with reprex v2.0.2}

codecov-commenter · 2022-10-16T23:12:55Z

Codecov Report

Attention: Patch coverage is 90.78947% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 90.75%. Comparing base (16d832b) to head (1dc280f).
Report is 1 commits behind head on main.

❗ Current head 1dc280f differs from pull request most recent head a9c77ec. Consider uploading reports for the commit a9c77ec to get more accurate results

Files	Patch %	Lines
R/effectsize.htest.R	90.14%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #522      +/-   ##
==========================================
+ Coverage   90.71%   90.75%   +0.04%     
==========================================
  Files          57       56       -1     
  Lines        3563     3440     -123     
==========================================
- Hits         3232     3122     -110     
+ Misses        331      318      -13

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mattansb · 2022-10-17T18:03:31Z

I haven't given a close look yet, but this seems like it does not cover the full scope of possible ~~garbage~~ htest data.name patterns...

See @strengejacke code here: https://github.com/easystats/insight/blob/main/R/utils_get_data.R

rempsyc · 2022-10-17T21:55:12Z

Ok. (When you have time) would it be possible to give me an example of what you mean by "htest data.name patterns"? Do you mean like to support the different htest objects (it wasn't clear to me from the link you provided)? I just added support for cohen's d and t-test for now, but am happy to add support for more if useful.

rempsyc · 2022-12-16T23:05:22Z

rempsyc · 2022-12-16T23:18:41Z

devtools::load_all("D:/github/forks/effectsize")
#> ℹ Loading effectsize

# t.test
x <- t.test(mpg ~ vs, data = mtcars)
effectsize(x)
#> Warning: Unable to retrieve data from htest object. Returning an approximate
#>   effect size using t_to_d().
#> d     |         95% CI
#> ----------------------
#> -1.96 | [-2.94, -0.95]
effectsize(x, data = mtcars)
#> Cohen's d |         95% CI
#> --------------------------
#> -1.70     | [-2.55, -0.82]
#> 
#> - Estimated using un-pooled SD.

# cor.test
# no need to specify the data argument
x <- cor.test(~ qsec + drat, data = mtcars)
effectsize(x)
#> Warning: This 'htest' method is not (yet?) supported.
#>   Returning 'parameters::model_parameters(model)'.
#> Pearson's product-moment correlation
#> 
#> Parameter1 | Parameter2 |    r |        95% CI | t(30) |     p
#> --------------------------------------------------------------
#> qsec       |       drat | 0.09 | [-0.27, 0.43] |  0.50 | 0.620
#> 
#> Alternative hypothesis: true correlation is not equal to 0

# wilcox.test
x <- wilcox.test(mpg ~ vs, data = mtcars)
#> Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
#> compute exact p-value with ties
effectsize(x)
#> Error: Unable to retrieve data from htest object.
#>   Try using 'rb()' directly.
effectsize(x, data = mtcars)
#> Warning: 'y' is numeric but has only 2 unique values.
#>   If this is a grouping variable, convert it to a factor.
#> r (rank biserial) |       95% CI
#> --------------------------------
#> 1.00              | [1.00, 1.00]

# chisq.test
# No data argument, so it is not possible to use the formula interface.

# fisher.test
# No data argument, so it is not possible to use the formula interface.

# chisq.test_gof
# ???

# mcnemar.test
# No data argument, so it is not possible to use the formula interface.

# friedman.test
wb <- aggregate(warpbreaks$breaks, by = list(
  w = warpbreaks$wool, t = warpbreaks$tension), FUN = mean)
wb
#>   w t        x
#> 1 A L 44.55556
#> 2 B L 28.22222
#> 3 A M 24.00000
#> 4 B M 28.77778
#> 5 A H 24.55556
#> 6 B H 18.77778
x <- friedman.test(x ~ w | t, data = wb)
effectsize(x)
#> Error: Unable to retrieve data from htest object.
#>   Try using 'kendalls_w()' directly.
effectsize(x, data = wb)
#> Kendall's W |       95% CI
#> --------------------------
#> 0.11        | [0.11, 1.00]
#> 
#> - One-sided CIs: upper bound fixed at [1.00].

# chisq.test_gof
# no need to specify the data argument

# oneway.test
# no need to specify the data argument
x <- oneway.test(extra ~ group, data = sleep, var.equal = TRUE)
effectsize(x)
#> Eta2 |       95% CI
#> -------------------
#> 0.16 | [0.00, 1.00]
#> 
#> - One-sided CIs: upper bound fixed at [1.00].

# kruskal.test
# Cannot get it to work
airquality2 <- airquality
airquality2$Month <- as.factor(airquality2$Month)
x <- kruskal.test(Ozone ~ Month, data = airquality2)
effectsize(x)
#> Warning in grepl("Kruskal-Wallis", x$method, fixed = TRUE) &&
#> grepl("^list\\(", : 'length(x) = 2 > 1' in coercion to 'logical(1)'
#> Error: Unable to retrieve data from htest object.
#>   Try using 'rank_epsilon_squared()' directly.
effectsize(x, data = airquality2)
#> Warning in grepl("Kruskal-Wallis", x$method, fixed = TRUE) &&
#> grepl("^list\\(", : 'length(x) = 2 > 1' in coercion to 'logical(1)'
#> Warning: Missing values detected. NAs dropped.
#> Epsilon2 (rank) |       95% CI
#> ------------------------------
#> 0.25            | [0.15, 1.00]
#> 
#> - One-sided CIs: upper bound fixed at [1.00].

^{Created on 2022-12-16 with reprex v2.0.2}

mattansb · 2023-07-03T06:41:02Z

@rempsyc chisq.test doesn't need any data - it is the only htest that actually contains the data 🥳

To merge this, we would need a lot of unit tests that show that this feature can be used as intended.
Here are two examples that fail because of the unrecoverable data - can they be made to work? (given here for t.test but applicable to anything really)

Example 1: advance formula interface
Example 2: vector interface, but data is removed
Example 3: test run with an index (similar to Can report able to report lists of htests ? report#295)

library(effectsize)
library(testthat)

tt1 <- t.test(mpg ~ I(am + cyl == 4), data = mtcars)
dd1 <- cohens_d(mpg ~ I(am + cyl == 4), data = mtcars, pooled_sd = FALSE)

dat <- mtcars
tt2 <- t.test(dat$mpg[dat$am==1], dat$mpg[dat$am==0])
dd2 <- cohens_d(dat$mpg[dat$am==1], dat$mpg[dat$am==0], pooled_sd = FALSE)
rm("dat")

col.y <- "mpg"
tt3 <- t.test(mtcars[[col.y]])
dd3 <- cohens_d(mtcars[[col.y]])
rm("col.y")

expect_equal(effectsize(tt1, data = mtcars)[[1]], dd1[[1]])
#> Error in `[.data.frame`(dots$data, , vars) : undefined columns selected

expect_equal(effectsize(tt2, data = mtcars)[[1]], dd2[[1]])
#> Error in `[.data.frame`(dots$data, , vars) : undefined columns selected

expect_equal(effectsize(tt3, data = mtcars)[[1]], dd3[[1]])
#> Error in `[.data.frame`(dots$data, , vars) : undefined columns selected

Use internal functions to not repeat yourself in code
If some examples cannot be made to work:
- If this feature really worth the trouble just for a formula interface (that sucks)? If so...
- Add informative errors to catch these "extreme" cases.

rempsyc · 2023-07-03T14:40:52Z

Thanks, that's very useful and specific! I'll look into implementing these suggestions :)

rempsyc · 2023-07-03T18:13:34Z

unit tests
internal functions to avoid repeating code
informative messages

Doesn’t work well with advanced formula interfaces for now but I return NULL for data if it can’t find it, so it defaults to the same as not providing the argument (plus a message about it). Working on trying to fix the formula stuff with regex right now but not sure if that’s the right approach…

Also, to fix the lints, I was going to change all the .effectsize_t.test to .effectsize_t_test, etc. (name style should match snake_case). But do you even want that?

rempsyc · 2023-08-20T02:37:54Z

For this to work you will literally need to re-write about 60% of the effectsize.htest.R file

Are you sure? Below I reproduce your example. Do you think it needs something more?

library(effectsize)

some_data <- mtcars
some_data$mpg[1] <- NA

args <- alist(
  mpg ~ am,
  data = some_data,
  alternative = "less", 
  mu = 1,
  pooled_sd = TRUE,
  var.equal = TRUE,
  subset = cyl == 4, 
  na.action = na.omit
)

tt <- do.call(args, what = t.test)

d1 <- do.call(args, what = cohens_d)

d2 <- do.call(c(alist(tt), args[-1]), what = effectsize)

all.equal(d1$Cohens_d, d2$Cohens_d)
#> [1] TRUE

^{Created on 2023-08-19 with reprex v2.0.2}

we would have to bring the coverage there to 100% first to make sure we don't break anything

I'm happy to start on that first, with your approval. Worst case scenario it's not wasted even if we don't end up merging this PR.

For this to work - even if just for very standard formula interface - the user will need to supply data, subset, na.action

True, but in my experience these arguments are not really used, but even so, we support them if the user really wants them, so I think that's ok.

is this worth it?

I think so, with the above example and all new tests I included, I think it adds value for the user. But maybe there is more that I am missing and hopefully you can point it out to me :P

mattansb · 2023-08-21T12:05:26Z

Without looking at the code, but looking at the tests, this looks pretty good.
Unfortunately I won't really have time to delve into this summer - but I think maybe we can have this done for a fall release?

rempsyc · 2024-04-28T12:51:12Z

Mattan, there are still some (new) lints, but I can address them with your approval (since it involves changing a lot of variable names that collide with base R functions)

mattansb · 2024-04-30T20:28:52Z

I still just don't like this as a solution - it requires the user to do so much work that they might as well just re-run the code to not use a formula (all of these tests run in a fraction of a second), no?

Does this fix anything upstream (e.g. in report)?

If you really think this is advantageous, go a head and make the fixes and merge (:

rempsyc · 2024-05-04T09:46:29Z

Yes, I can understand that, but the way I see it, it adds functionality and does not remove functionality, and yes, as well, it does fix some issues upstream with report :) So will go ahead with the fixes and merge, thanks!

rempsyc · 2024-05-04T12:38:23Z

Failing tests are unrelated to this PR

rempsyc · 2024-05-11T10:32:56Z

I think I need the approval of a reviewer to merge this, if I want to avoid bypassing the branch protections

Adds data argument to effectsize.htest/cohens_d, fixes easystats#245

48013f3

rempsyc requested review from strengejacke and mattansb October 16, 2022 23:08

rempsyc mentioned this pull request Oct 16, 2022

report_table(): let's make this right easystats/report#289

Open

6 tasks

rempsyc requested review from DominiqueMakowski and removed request for strengejacke October 16, 2022 23:20

mattansb added the WIP 👷‍♂️ work in progress label Oct 17, 2022

mattansb mentioned this pull request Oct 18, 2022

htest examples - don't use formula method easystats/report#292

Closed

rempsyc and others added 5 commits November 5, 2022 13:52

Merge branch 'main' into htest_data

b67b5db

Merge branch 'main' into htest_data

3a890ef

Merge branch 'main' into htest_data

361ef1b

Merge branch 'main' into htest_data

02052a3

effectsize.htest: add support for: wilcox.test & friedman.test

f8d9d62

rempsyc added 2 commits December 20, 2022 12:02

Merge branch 'main' into htest_data

9a1edff

Merge branch 'main' into htest_data

4652336

This comment was marked as outdated.

Sign in to view

Merge branch 'main' into htest_data

2238472

This comment was marked as outdated.

Sign in to view

Merge branch 'main' into htest_data

d909584

rempsyc mentioned this pull request Jul 2, 2023

Can report able to report lists of htests ? easystats/report#295

Open

rempsyc mentioned this pull request Jul 3, 2023

effectsize issue with wilcox.test when named the same as variable #592

Closed

unit tests, internal functions, informative messages

f788f30

rempsyc added 7 commits August 20, 2023 11:29

add tests

441815a

Merge branch 'main' into htest_data

0e66d07

styler, lints

e7dd792

add tests

883b8f1

skip on oldres-3 linux

a24b611

adjust r version

5327fb1

fix

12a736f

rempsyc removed the WIP 👷‍♂️ work in progress label Apr 1, 2024

Merge branch 'main' into htest_data

5b3058b

rempsyc marked this pull request as ready for review April 1, 2024 10:06

rempsyc added the enhancement 🔥 New feature or request label Apr 1, 2024

Merge branch 'main' into htest_data

caa9e5c

rempsyc added the WIP 👷‍♂️ work in progress label Apr 18, 2024

rempsyc marked this pull request as draft April 18, 2024 10:16

rempsyc added 2 commits April 28, 2024 11:15

Merge branch 'main' into htest_data

b0608e4

adjust tests for "cannot use 'paired = TRUE' in formula method." error

21c6ad3

rempsyc marked this pull request as ready for review April 28, 2024 12:49

rempsyc added 2 commits May 4, 2024 13:10

lints (data -> data1, df -> df1, args -> args1, etc.)

7dd661d

typo

a9c77ec

rempsyc requested a review from mattansb May 9, 2024 20:10

rempsyc removed the WIP 👷‍♂️ work in progress label May 11, 2024

mattansb merged commit 7cbed4f into easystats:main May 11, 2024
21 of 27 checks passed

rempsyc deleted the htest_data branch May 12, 2024 07:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds data argument to effectsize.htest/cohens_d #522

Adds data argument to effectsize.htest/cohens_d #522

rempsyc commented Oct 16, 2022 •

edited

Loading

codecov-commenter commented Oct 16, 2022 •

edited by codecov bot

Loading

mattansb commented Oct 17, 2022

rempsyc commented Oct 17, 2022

rempsyc commented Dec 16, 2022 •

edited

Loading

rempsyc commented Dec 16, 2022

This comment was marked as outdated.

This comment was marked as outdated.

mattansb commented Jul 3, 2023

rempsyc commented Jul 3, 2023

rempsyc commented Jul 3, 2023 •

edited

Loading

rempsyc commented Aug 20, 2023

mattansb commented Aug 21, 2023

rempsyc commented Apr 28, 2024

mattansb commented Apr 30, 2024

rempsyc commented May 4, 2024

rempsyc commented May 4, 2024

rempsyc commented May 11, 2024

Adds data argument to effectsize.htest/cohens_d #522

Adds data argument to effectsize.htest/cohens_d #522

Conversation

rempsyc commented Oct 16, 2022 • edited Loading

codecov-commenter commented Oct 16, 2022 • edited by codecov bot Loading

Codecov Report

mattansb commented Oct 17, 2022

rempsyc commented Oct 17, 2022

rempsyc commented Dec 16, 2022 • edited Loading

rempsyc commented Dec 16, 2022

This comment was marked as outdated.

This comment was marked as outdated.

mattansb commented Jul 3, 2023

rempsyc commented Jul 3, 2023

rempsyc commented Jul 3, 2023 • edited Loading

rempsyc commented Aug 20, 2023

mattansb commented Aug 21, 2023

rempsyc commented Apr 28, 2024

mattansb commented Apr 30, 2024

rempsyc commented May 4, 2024

rempsyc commented May 4, 2024

rempsyc commented May 11, 2024

rempsyc commented Oct 16, 2022 •

edited

Loading

codecov-commenter commented Oct 16, 2022 •

edited by codecov bot

Loading

rempsyc commented Dec 16, 2022 •

edited

Loading

rempsyc commented Jul 3, 2023 •

edited

Loading