-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add section discussing SAIGE prototype #173
Comments
I ran the SAIGE prototype again today. Steps
ResultsFirst, I confirm that the numeric columns in the outputs match. >>> import pandas as pd
>>> vcz = pd.read_csv('output/chr21_10_4_quantitative_vcz.txt', sep='\t')
>>> bcf = pd.read_csv('output/chr21_10_4_quantitative_bcf.txt', sep='\t')
>>> vcz.shape == bcf.shape
True
>>> for column in vcz.columns:
... print(column, all(vcz[column] == bcf[column]))
...
CHR True
POS True
MarkerID False
Allele1 False
Allele2 False
AC_Allele2 True
AF_Allele2 True
MissingRate True
BETA True
SE True
Tstat True
var True
p.value True
N True Second, I look at the locus with the causal allele in SAIGE's output.
Finally, I record the execution time. For VCF.
For VCZ.
DiscussionThe high p-value makes me wonder if I am doing something incorrectly... At least the VCZ output matches the VCF output. I wanted to try using Savvy-formatted input. However, I am not able to install Savvy. Neither the cget method or the conda method is working for me. The execution time results are different from Will-Tyler/SAIGE#1 (comment). There are several differences between those results and these results. First, I pushed a commit after reporting the first results. Second, I was testing with a binary phenotype whereas here I am testing with a quantitative phenotype. Third, my notes and shell history from running SAIGE the first time some months ago are incomplete but indicate that I used a different strategy to run the first step of the single-variant test. Hopefully, you can tell me what improvements you would like to see, and I can see what I can do. Environment detailsApple M1 machine with arm64 architecture, macOS 15.2, and 8192 MiB RAM. |
Thanks @Will-Tyler. We're going to need to make this reproducible for the paper, so I think it's worthwhile thinking about how best to package this up into a runnable analysis now. I guess the simplest thing to do would be to have SAIGE-protoype directory with a Makefile, which clones your repo, and then builds the code locally? I guess we'd need a shell script then to do the actual running of the thing, plus a notebook to document the process? |
Based on this code: Will-Tyler/SAIGE#1
We should compare with VCF, BCF and Savvy (I think SAIGE supports this), using the simulated data we already have. We don't have to go up to 1M samples, just whatever seems reasonable. We can plot the time to run the association test based on traits simulated by tstrait against increasing sample size, as usual.
The text was updated successfully, but these errors were encountered: