Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation data set not producing values concordant with manual #34

Open
J-Lye opened this issue Sep 6, 2021 · 1 comment
Open

Validation data set not producing values concordant with manual #34

J-Lye opened this issue Sep 6, 2021 · 1 comment

Comments

@J-Lye
Copy link

J-Lye commented Sep 6, 2021

I've used OUTRIDER in my Thesis but have been advised that the very slight variability between the values I achieve and the values achieved in the OUTRIDER manual is a serious concern and may invalidate my results.

No matter how many times I repeat or modify my approach the results are always the same, it's a tiny difference 1.2x10^12 in value for example. I get these slightly different results no matter if I use the simple example or download the Kremer dataset and run it with the full OUTRIDER code.

Results from the Manual

  geneID sampleID pValue padjust zScore l2fc rawcounts normcounts
1:00 ATAD3C MUC1360 2.82E-11 1.57E-07 5.27 1.87 948 246.26
2:00 NBPF15 MUC1351 8.10E-10 4.51E-06 5.75 0.77 7591 7050.72
3:00 MSTO1 MUC1367 4.46E-09 2.48E-05 -6.2 -0.81 761 729.7
4:00 HDAC1 MUC1350 1.54E-08 8.56E-05 -5.93 -0.79 2215 2113.06
5:00 DCAF6 MUC1374 6.93E-08 3.86E-04 -5.68 -0.61 2348 3084.41
6:00 NBPF16 MUC1351 2.61E-07 7.25E-04 4.82 0.67 4014 3834.4
meanCorrected theta aberrant AberrantBySample AberrantByGene padj_rank
1:00 84.16 16.66 TRUE 1 1 1
2:00 4417.1 109.8 TRUE 2 1 1
3:00 1238.19 151.57 TRUE 1 1 1
4:00 3521.37 134.57 TRUE 1 1 1
5:00 4603 197.14 TRUE 1 1 1
6:00 2564.52 105.73 TRUE 2 1 2

Results from my Rscript
I am using the exact script from the manual and would benefit from confirmation others / developers are also experiencing this and it's due to some optimisation or something?

  geneID sampleID pValue padjust zScore l2fc rawcounts normcounts
1:00 ATAD3C MUC1360 2.70E-11 1.50E-07 5.29 1.87 948 246.93
2:00 NBPF15 MUC1351 6.48E-10 3.60E-06 5.79 0.78 7591 7070.41
3:00 MSTO1 MUC1367 4.76E-09 2.65E-05 -6.19 -0.81 761 729.59
4:00 HDAC1 MUC1350 1.34E-08 7.44E-05 -5.95 -0.78 2215 2121.49
5:00 DCAF6 MUC1374 6.26E-08 3.48E-04 -5.7 -0.61 2348 3084.29
6:00 NBPF16 MUC1351 2.19E-07 6.10E-04 4.85 0.68 4014 3844.74
  meanCorrected theta aberrant AberrantBySample AberrantByGene padj_rank
1:00 86.15 16.61 TRUE 1 1 1
2:00 4500.21 109.83 TRUE 2 1 1
3:00 1216.01 150.84 TRUE 1 1 1
4:00 3529.56 137.72 TRUE 1 1 1
5:00 4600.94 198.54 TRUE 1 1 1
6:00 2603.5 105.75 TRUE 2 1 2
@c-mertes
Copy link
Contributor

c-mertes commented Jan 3, 2023

Dear @J-Lye,
thank you for reporting this difference. Under the hood, we use the CPU-optimized RcppArmadillo package (https://arma.sourceforge.net/, https://cran.r-project.org/web/packages/RcppArmadillo/index.html). As this is compiled using locally available CPU functionality, the OUTRIDER optimization can lead to minor rounding differences across different CPU architectures. But if you run it locally on the same CPU twice, the results should replicate as the code is deterministic but unfortunately not agnostic of the underlying hardware.

I hope this helped you understand your differences in the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants