Extreme precision introduced into yi plots after fixing standardisation #151

egouldo · 2024-09-06T01:50:08Z

Related to #149

…ansformation fns

…ith devtools

…ansformation fns

…ith devtools

egouldo · 2024-09-09T02:56:18Z

See reprex results here:

https://github.com/egouldo/ManyEcoEvo/blob/98b1c27ceb22140f4fd7f1e0260f39a0bb001bfc/cilia-stag_reprex.md

Email report to lead team on 6 September:

Hi Team,

I’ve gotten to the bottom of why the yi figures have changed suddenly and it has to do with our back transformation function, I’ve screen capped the log_back function below. All back transformation functions follow this same structure, the only thing that really differs is the calculation of the ‘original’ object:

Now, I realised last week that we were taking the standard deviation of the back-transformed distribution and calling that se_est, i.e.:
se_est <- sd(original)

Shinichi and I agreed that this was incorrect and I changed the line so that it reads as above in the image, i.e. se_est <- sd_est / sort(length(original))

It is only this change that results in the standard error becoming smaller and therefore more precise.

It occurred to me that actually the reason we might have originally assigned the sd(original) to se_est was because actually we give the standard error of the estimate se to rnorm(), which, after checking the arguments required by rnorm() it is expecting sd in place of se (which, as a reminder, we use to generate a normal distribution before we back-transform to the original scale).

So question 1: is supplying SE in place of SD to rnorm() statistically / mathematically acceptable Shinichi? If so, this would explain why we assigned the standard deviation of the back-transformed distribution to se_est…

Option 1: If so, then we can revert back to what we had originally.

This would make sense. Because:

Result 1: actually and now extremely obvious that I’ve considered it, the standard error that we compute from the back-transformed distribution is sensitive to the number of simulations we run… so for example, I experimented with different numbers of simulations when log-transforming the response-scaled estimates and the standard error of the transformed distribution is sensitive to the number of simulations:

Option 2: If not, then actually, we should be using the sample size from the analysts to convert analysts’ SE back to SD, before back-transforming to the response scale. And even then, given fact 1, even if we did this, and we computed the standard error from the transformed distributions, as per the screenshot above, we still have the problem of the standard error estimates being sensitive to the number of simulations we use, whereas the SD seems to stabilise at 100 sims (number of sims used in the transformation is in the column names).

Option 3: We use the the analysts’ sample size to compute yi_sd and pass this to our back-transformation functions, but we don’t take the standard error from it. We take the SD. And then we reconvert to SE using the analysts’ sample size that we used to compute yi_sd.
One issue with option 3 is that actually we will lose even more data for the yi analyses (from 49 to 33 analyses for eucs, 69 to 49 analyses for blue tit), because not everyone provided us with their sample size estimates.
Result 2: When I experimented with converting the analyst SE to SD prior to back-transforming estimates to the response scale (sd.fit.aug below), and then transforming again to the log-scale (which is what we had done for the euc estimates, we get very large yi estimates on the logged scale (mean_log: is the yi.estimate on the log scale, which SHOULD match ‘fit.aug’ the original values that were back-transformed and then log-transformed again. Note that the suffix at the end of the estimates on the logical indicates whether I used the back-transformed standard error or standard deviation to get back to the original analyst estimates. So we would also hope that sd_logSD approximates sd.fit.aug, which it does not.

When we look at the back-transformation where we use the standard error of the estimate to back-transform and then log_transform back to the original scale of the analyst estimates, those estimates (mean_log*) are closer to what we would expect, regardless of whether we used the back-transformed SE or the back-transformed SD (i..e closer to fit.aug). Also, also the standard deviation of the estimates on the log-scale calculated using the back-transformed standard deviation aligns MUCH more closely with the original estimates on the same scale (se.fit.aug):

So this final screen shot above is what we had implemented originally before the forest plots for the yi went wonky.

I haven’t reciprocated the same process as above for the blue tit analyses exactly, but when I check the outputs of ‘identity_back()’, which applies the same process but returning data on the same scale, then the standard deviation of the “back-transformed” i.e. processed values match the original inputs (se.fit.aug):

SOO, with all that I think we should return to option 1: using the standard error as inputs into all back-transformation, and then taking the standard deviation from that distribution and calling it the standard error of the deviation.

Best,
Elliot.

Review:

From Shinichi:

Option 1 please

SE is the standard deviation SD of the estimate so we were correct all along probably

[...]

SD has two kinds SD of sample which we usually consider as SD. But SD of eatimate or SE is what we want for SD for rnorm

revert back to inputting SE for all back functions,
update docs if necessary roxygen2 documentation for all functions #102

… to back-transformation and assigning as SE And update wrapper functions so that NA's return dataframes matched to dim and str from _back functions

noted in issue comments previously: #151 (comment)

egouldo · 2024-09-10T06:57:35Z

Should be OK after rebuilding. Rerun manuscript to check.

egouldo added this to the Respond Reviewer Comments milestone Sep 6, 2024

egouldo self-assigned this Sep 6, 2024

egouldo added the bug an unexpected problem or unintended behavior label Sep 6, 2024

egouldo added a commit that referenced this issue Sep 6, 2024

- #151 retain sample size

63cfec3

egouldo added a commit that referenced this issue Sep 9, 2024

- bug, refactor!: extract SD in addition to SE for analysis #151

f946ba4

egouldo added a commit that referenced this issue Sep 9, 2024

- analysis: #151 investigate extreme precision after changing back-tr…

730f540

…ansformation fns

egouldo added a commit that referenced this issue Sep 9, 2024

#151 update file for reprex generation, output objects and load pkg w…

8eb19e6

…ith devtools

egouldo added a commit that referenced this issue Sep 9, 2024

- bug, refactor!: extract SD in addition to SE for analysis #151

e626e56

egouldo added a commit that referenced this issue Sep 9, 2024

- analysis: #151 investigate extreme precision after changing back-tr…

6d49cb4

…ansformation fns

egouldo added a commit that referenced this issue Sep 9, 2024

#151 update file for reprex generation, output objects and load pkg w…

86f05e2

…ith devtools

egouldo added a commit that referenced this issue Sep 9, 2024

- analysis #151 add reprex chunk options to quiet messages, add headings

c657059

egouldo added a commit that referenced this issue Sep 9, 2024

- analysis #151 add additional reprex chunk to quiet output

20722b4

egouldo added a commit that referenced this issue Sep 9, 2024

#151 add analysis reprex

98b1c27

egouldo added a commit that referenced this issue Sep 9, 2024

- fix!: #151 revert to taking sd() from normalised distribution prior…

4c405ec

… to back-transformation and assigning as SE And update wrapper functions so that NA's return dataframes matched to dim and str from _back functions

egouldo added a commit that referenced this issue Sep 9, 2024

- #151 rm reprex and investigation script

4a32055

noted in issue comments previously: #151 (comment)

egouldo mentioned this issue Sep 9, 2024

Investigate and fix yi meta-analysis precision #152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extreme precision introduced into yi plots after fixing standardisation #151

Extreme precision introduced into yi plots after fixing standardisation #151

egouldo commented Sep 6, 2024

egouldo commented Sep 9, 2024

egouldo commented Sep 10, 2024