Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can report able to report lists of htests ? #295

Open
Larissa-Cury opened this issue Nov 4, 2022 · 10 comments
Open

Can report able to report lists of htests ? #295

Larissa-Cury opened this issue Nov 4, 2022 · 10 comments
Labels
feature idea 🔥 New feature or request reprex 📊 We need a reproducible example for further investigation waiting for response 💌 Need more information from people who submitted the issue

Comments

@Larissa-Cury
Copy link

Larissa-Cury commented Nov 4, 2022

Describe the solution you'd like
A clear and concise description of what you want to happen.

I have a list of htests from a for-loop. It would be wonderful If I could run report_text(mylist) . I found a way to unlist the results , but it's not optimal since I always get the warning message...

How could we do it?
A description of actual ways of implementing a feature.

I'm sorry, but I don't have a pathway to recomend. I've tried some ways, but none of them seems to work:

report_text(unlist(tests)) report_text(tests[[1]]) report_text(bind_rows(tests)) tests <- tests %>% discard(is.null) report_text(tests)

I really hope that's a possibility since I have a lot of loops that store htests on a list 😊

@rempsyc
Copy link
Member

rempsyc commented Dec 16, 2022

Could you provide a reprex (a minimally reproducible example)? What kind of htests are you talking about for example, t-tests, correlation tests?

@rempsyc rempsyc added feature idea 🔥 New feature or request reprex 📊 We need a reproducible example for further investigation waiting for response 💌 Need more information from people who submitted the issue labels Jan 18, 2023
@rempsyc
Copy link
Member

rempsyc commented Jul 2, 2023

Here is a reprex for your issue based on your stackoverflow question:

# Original example
# Note: You do not need to provide the data argument if you provide x and y directly:
library(report)
a <- t.test(iris$Sepal.Width, iris$Sepal.Length, paired = T)
report_text(a)
#> Effect sizes were labelled following Cohen's (1988) recommendations.
#> 
#> The Paired t-test testing the difference between iris$Sepal.Width and
#> iris$Sepal.Length (mean difference = -2.79) suggests that the effect is
#> negative, statistically significant, and large (difference = -2.79, 95% CI
#> [-2.94, -2.63], t(149) = -34.82, p < .001; Cohen's d = -2.84, 95% CI [-3.65,
#> -2.48])

Next we try your loop, but rely on lapply instead.

col.list <- names(iris)[2:4]

tests <- lapply(iris[col.list], t.test, iris$Sepal.Length, paired = TRUE)

lapply(tests, report)
#> Warning: Unable to retrieve data from htest object.
#>   Returning an approximate effect size using t_to_d().

#> Warning: Unable to retrieve data from htest object.
#>   Returning an approximate effect size using t_to_d().

#> Warning: Unable to retrieve data from htest object.
#>   Returning an approximate effect size using t_to_d().
#> $Sepal.Width
#> Effect sizes were labelled following Cohen's (1988) recommendations.
#> 
#> The Paired t-test testing the difference between X[[i]] and iris$Sepal.Length
#> (mean difference = -2.79) suggests that the effect is negative, statistically
#> significant, and large (difference = -2.79, 95% CI [-2.94, -2.63], t(149) =
#> -34.82, p < .001; Cohen's d = -2.85, 95% CI [-3.66, -2.49])
#> 
#> $Petal.Length
#> Effect sizes were labelled following Cohen's (1988) recommendations.
#> 
#> The Paired t-test testing the difference between X[[i]] and iris$Sepal.Length
#> (mean difference = -2.09) suggests that the effect is negative, statistically
#> significant, and large (difference = -2.09, 95% CI [-2.27, -1.90], t(149) =
#> -22.81, p < .001; Cohen's d = -1.87, 95% CI [-2.13, -1.60])
#> 
#> $Petal.Width
#> Effect sizes were labelled following Cohen's (1988) recommendations.
#> 
#> The Paired t-test testing the difference between X[[i]] and iris$Sepal.Length
#> (mean difference = -4.64) suggests that the effect is negative, statistically
#> significant, and large (difference = -4.64, 95% CI [-4.72, -4.57], t(149) =
#> -117.54, p < .001; Cohen's d = -9.63, 95% CI [-10.72, -8.51])

So, although the data is provided without the formula interface @mattansb, the data still cannot be retrieved. That’s why we need easystats/effectsize#522 merged. This way, the data argument can be provided inside report and could be found within lapply. Do you need anything more from me so we can merge it?

Created on 2023-07-02 with reprex v2.0.2

@mattansb
Copy link
Member

mattansb commented Jul 3, 2023

There is no problem with lists of htests:

L <- list(a = t.test(iris$Sepal.Length, iris$Sepal.Width),
          b = t.test(iris$Sepal.Length, iris$Petal.Width))


lapply(L, effectsize::effectsize)
#> $a
#> Cohen's d |       95% CI
#> ------------------------
#> 4.21      | [3.76, 4.66]
#> 
#> - Estimated using un-pooled SD.
#> $b
#> Cohen's d |       95% CI
#> ------------------------
#> 5.84      | [5.31, 6.35]
#> 
#> - Estimated using un-pooled SD.

Any limitation if from R's htest class and what insight::get_data() can do with such limited information.

How will passing a data argument help here? Continuing the discussion over at easystats/effectsize#522.

@rempsyc
Copy link
Member

rempsyc commented Jul 3, 2023

Thanks for this Mattan. What your reprex shows is that clearly the list of htests is not the problem, but rather, the way in which the list of htests is created. In your case, when the list is made by hand, the name of variable X is properly stored, so can be retrieved (but lists are rarely made by hand in workflows). Whereas, when the list is made through lapply, the name of variable X is stored as X[[i]], and therefore cannot be retrieved properly by effectsize.

L <- list(t.test(iris$Sepal.Length, iris$Sepal.Width),
          t.test(iris$Sepal.Length, iris$Petal.Width))

L[[1]]$data.name
#> [1] "iris$Sepal.Length and iris$Sepal.Width"

effectsize::effectsize(L[[1]])
#> Cohen's d |       95% CI
#> ------------------------
#> 4.21      | [3.76, 4.66]
#> 
#> - Estimated using un-pooled SD.

L <- lapply(iris[c(2, 4)], t.test, iris$Sepal.Length)

L[[1]]$data.name
#> [1] "X[[i]] and iris$Sepal.Length"

effectsize::effectsize(L[[1]])
#> Warning: Unable to retrieve data from htest object.
#>   Returning an approximate effect size using t_to_d().
#> d     |         95% CI
#> ----------------------
#> -4.85 | [-5.37, -4.33]

Created on 2023-07-03 with reprex v2.0.2

So it seems this is a basic problem of lapply in combination with htests, and not a problem we can fix from report or effectsize side, I think?

@mattansb
Copy link
Member

mattansb commented Jul 4, 2023

Since we don't have a reprex for @Larissa-Cury original error, I can't offer a solution.

But to your example Remi, we can add report to the lapply:

library(report)

foo <- function(col.y) {
  test <- t.test(iris[[col.y]], iris$Sepal.Length)
  report_text(test)
}

col.list <- names(iris)[2:4]

lapply(col.list, foo)
[[1]]
Effect sizes were labelled following Cohen's (1988) recommendations.

The Welch Two Sample t-test testing the difference between iris[[col.y]] and
iris$Sepal.Length (mean of x = 3.06, mean of y = 5.84) suggests that the effect is
negative, statistically significant, and large (difference = -2.79, 95% CI [-2.94,
-2.64], t(225.68) = -36.46, p < .001; Cohen's d = -5.84, 95% CI [-6.35, -5.31])

[[2]]
Effect sizes were labelled following Cohen's (1988) recommendations.

The Welch Two Sample t-test testing the difference between iris[[col.y]] and
iris$Sepal.Length (mean of x = 3.76, mean of y = 5.84) suggests that the effect is
negative, statistically significant, and large (difference = -2.09, 95% CI [-2.40,
-1.77], t(211.54) = -13.10, p < .001; Cohen's d = -5.84, 95% CI [-6.35, -5.31])

[[3]]
Effect sizes were labelled following Cohen's (1988) recommendations.

The Welch Two Sample t-test testing the difference between iris[[col.y]] and
iris$Sepal.Length (mean of x = 1.20, mean of y = 5.84) suggests that the effect is
negative, statistically significant, and large (difference = -4.64, 95% CI [-4.82,
-4.46], t(295.98) = -50.54, p < .001; Cohen's d = -5.84, 95% CI [-6.35, -5.31])

@rempsyc
Copy link
Member

rempsyc commented Jul 7, 2023

That's wonderful Mattan, of course, elegantly simple solution! I think that solves @Larissa-Cury's problem so we can close this issue now.

@rempsyc rempsyc closed this as completed Jul 7, 2023
@rempsyc
Copy link
Member

rempsyc commented Jul 7, 2023

Actually @mattansb I might have spoken too fast. When I tried this within a reprex, I still get the warning "Unable to retrieve data from htest object". I think you should have gotten it too because the text output of report uses iris[[col.y]] instead of the actual column name.

library(effectsize)

foo <- function(col.y) {
  test <- t.test(iris[[col.y]], iris$Sepal.Length)
  effectsize(test)
}

col.list <- names(iris)[2:4]

x <- lapply(col.list, foo)
#> Warning: Unable to retrieve data from htest object.
#>   Returning an approximate effect size using t_to_d().

#> Warning: Unable to retrieve data from htest object.
#>   Returning an approximate effect size using t_to_d().

#> Warning: Unable to retrieve data from htest object.
#>   Returning an approximate effect size using t_to_d().

Created on 2023-07-06 with reprex v2.0.2

I think even easystats/effectsize#522 won't solve this actually (I was wrong about this) because the problem isn't access to the dataset, but to the variable name, which cannot be retrieved.

library(effectsize)
packageVersion("effectsize")
#> [1] '0.8.3.11'
L <- lapply(iris[c(2, 4)], t.test, iris$Sepal.Length)
L[[1]]$data.name
#> [1] "X[[i]] and iris$Sepal.Length"
lapply(L, effectsize, data = iris)
#> Error in eval(parse(text = columns[1])): object 'X' not found

Created on 2023-07-06 with reprex v2.0.2

library(effectsize)
packageVersion("effectsize")
#> [1] '0.8.3.11'
foo <- function(col.y) {
  test <- t.test(iris[[col.y]], iris$Sepal.Length)
  effectsize(test, data = iris)
}
col.list <- names(iris)[2:4]
x <- lapply(col.list, foo)
#> Error in eval(parse(text = columns[1])): object 'col.y' not found

Created on 2023-07-06 with reprex v2.0.2

@rempsyc rempsyc reopened this Jul 7, 2023
@mattansb
Copy link
Member

mattansb commented Jul 7, 2023

In a reprex env it doesn't work, but if you run it interactively it does. There is only so much scoping that can be supported....

@rempsyc
Copy link
Member

rempsyc commented Jul 7, 2023

When I run it interactively from the console, I still get the warnings. Strange, I tried with both the latest CRAN version and current GitHub development version.

@mattansb
Copy link
Member

mattansb commented Jul 8, 2023

That's weird... It worked for me :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature idea 🔥 New feature or request reprex 📊 We need a reproducible example for further investigation waiting for response 💌 Need more information from people who submitted the issue
Projects
None yet
Development

No branches or pull requests

3 participants