Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extreme precision introduced into yi plots after fixing standardisation #151

Open
egouldo opened this issue Sep 6, 2024 · 2 comments
Open
Assignees
Labels
bug an unexpected problem or unintended behavior

Comments

@egouldo
Copy link
Owner

egouldo commented Sep 6, 2024

Related to #149

@egouldo egouldo added this to the Respond Reviewer Comments milestone Sep 6, 2024
@egouldo egouldo self-assigned this Sep 6, 2024
@egouldo egouldo added the bug an unexpected problem or unintended behavior label Sep 6, 2024
egouldo added a commit that referenced this issue Sep 6, 2024
egouldo added a commit that referenced this issue Sep 9, 2024
egouldo added a commit that referenced this issue Sep 9, 2024
egouldo added a commit that referenced this issue Sep 9, 2024
@egouldo
Copy link
Owner Author

egouldo commented Sep 9, 2024

See reprex results here:

https://github.com/egouldo/ManyEcoEvo/blob/98b1c27ceb22140f4fd7f1e0260f39a0bb001bfc/cilia-stag_reprex.md

Email report to lead team on 6 September:

Hi Team,

I’ve gotten to the bottom of why the yi figures have changed suddenly and it has to do with our back transformation function, I’ve screen capped the log_back function below. All back transformation functions follow this same structure, the only thing that really differs is the calculation of the ‘original’ object:

PastedGraphic-1

Now, I realised last week that we were taking the standard deviation of the back-transformed distribution and calling that se_est, i.e.:
se_est <- sd(original)

Shinichi and I agreed that this was incorrect and I changed the line so that it reads as above in the image, i.e. se_est <- sd_est / sort(length(original))

It is only this change that results in the standard error becoming smaller and therefore more precise.

It occurred to me that actually the reason we might have originally assigned the sd(original) to se_est was because actually we give the standard error of the estimate se to rnorm(), which, after checking the arguments required by rnorm() it is expecting sd in place of se (which, as a reminder, we use to generate a normal distribution before we back-transform to the original scale).

So question 1: is supplying SE in place of SD to rnorm() statistically / mathematically acceptable Shinichi? If so, this would explain why we assigned the standard deviation of the back-transformed distribution to se_est

Option 1: If so, then we can revert back to what we had originally.

This would make sense. Because:

Result 1: actually and now extremely obvious that I’ve considered it, the standard error that we compute from the back-transformed distribution is sensitive to the number of simulations we run… so for example, I experimented with different numbers of simulations when log-transforming the response-scaled estimates and the standard error of the transformed distribution is sensitive to the number of simulations:

PastedGraphic-2

Option 2: If not, then actually, we should be using the sample size from the analysts to convert analysts’ SE back to SD, before back-transforming to the response scale. And even then, given fact 1, even if we did this, and we computed the standard error from the transformed distributions, as per the screenshot above, we still have the problem of the standard error estimates being sensitive to the number of simulations we use, whereas the SD seems to stabilise at 100 sims (number of sims used in the transformation is in the column names).

Option 3: We use the the analysts’ sample size to compute yi_sd and pass this to our back-transformation functions, but we don’t take the standard error from it. We take the SD. And then we reconvert to SE using the analysts’ sample size that we used to compute yi_sd.
One issue with option 3 is that actually we will lose even more data for the yi analyses (from 49 to 33 analyses for eucs, 69 to 49 analyses for blue tit), because not everyone provided us with their sample size estimates.
Result 2: When I experimented with converting the analyst SE to SD prior to back-transforming estimates to the response scale (sd.fit.aug below), and then transforming again to the log-scale (which is what we had done for the euc estimates, we get very large yi estimates on the logged scale (mean_log: is the yi.estimate on the log scale, which SHOULD match ‘fit.aug’ the original values that were back-transformed and then log-transformed again. Note that the suffix at the end of the estimates on the logical indicates whether I used the back-transformed standard error or standard deviation to get back to the original analyst estimates. So we would also hope that sd_logSD approximates sd.fit.aug, which it does not.

PastedGraphic-3

When we look at the back-transformation where we use the standard error of the estimate to back-transform and then log_transform back to the original scale of the analyst estimates, those estimates (mean_log*) are closer to what we would expect, regardless of whether we used the back-transformed SE or the back-transformed SD (i..e closer to fit.aug). Also, also the standard deviation of the estimates on the log-scale calculated using the back-transformed standard deviation aligns MUCH more closely with the original estimates on the same scale (se.fit.aug):
PastedGraphic-4

So this final screen shot above is what we had implemented originally before the forest plots for the yi went wonky.

I haven’t reciprocated the same process as above for the blue tit analyses exactly, but when I check the outputs of ‘identity_back()’, which applies the same process but returning data on the same scale, then the standard deviation of the “back-transformed” i.e. processed values match the original inputs (se.fit.aug):

PastedGraphic-5

SOO, with all that I think we should return to option 1: using the standard error as inputs into all back-transformation, and then taking the standard deviation from that distribution and calling it the standard error of the deviation.

Best,
Elliot.

Review:

From Shinichi:

Option 1 please

SE is the standard deviation SD of the estimate so we were correct all along probably

[...]

SD has two kinds SD of sample which we usually consider as SD. But SD of eatimate or SE is what we want for SD for rnorm


egouldo added a commit that referenced this issue Sep 9, 2024
… to back-transformation and assigning as SE

And update wrapper functions so that NA's return dataframes matched to dim and str from _back functions
egouldo added a commit that referenced this issue Sep 9, 2024
noted in issue comments previously: #151 (comment)
@egouldo
Copy link
Owner Author

egouldo commented Sep 10, 2024

  • Should be OK after rebuilding. Rerun manuscript to check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant