Linear regression (R2) fails when boostrapping on very few data points #124

IAlibay · 2024-08-22T19:24:16Z

What happens?

When plotting dGs, the statistics calculations attempt to generate 1000 boostraps (with replacements) of the input dataset.
If one of the boostraps yields the same number for all the values, the linear regression fails (because you can analyze a flat line).
If you have 4 data points, this means that there is a 0.39% chance that you will pick the same number 4 times in a row. I.e. ~ 3 boostraps will have the same value for all the data points.

Should this even happen?

Probably not, calculating correlation metrics on so few data points is probably not sensible. It may be that the answer here is to just not report correlation metrics when there are fewer than a set number of data points (at least 10?).

IAlibay mentioned this issue Sep 2, 2024

Fix (or separate) script for analysing systems when boostrapping fails OpenFreeEnergy/IndustryBenchmarks2024#140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear regression (R2) fails when boostrapping on very few data points #124

Linear regression (R2) fails when boostrapping on very few data points #124

IAlibay commented Aug 22, 2024

Linear regression (R2) fails when boostrapping on very few data points #124

Linear regression (R2) fails when boostrapping on very few data points #124

Comments

IAlibay commented Aug 22, 2024

What happens?

Should this even happen?