-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ploidy fixes #135
Ploidy fixes #135
Conversation
Removed ploidy information from sample HG01804.SG in 2015_1000Genomes_1240K_haploid_pulldown
Brilliant! I love it when the validation helps to make the data better. I also ran Why did you also remove/set to |
Thanks, good to know that Well the thing with the |
Good catch! Did you search for that systematically as well? Or just fixed it when it came up together with the ploidy issue? |
Hmm, I think I did not, no. Not sure exactly now. |
I think this would be the relevant query, right? qjanno "SELECT Poseidon_ID,Capture_Type FROM d(.) WHERE Date_Type = 'modern' AND Capture_Type IS NOT NULL" Which reminds me that qjanno should add a column with the path to the .janno file a given sample is coming from. Or even the name of the package. The R package already has a feature like this. This would make it much more easy to determine which packages are affected here. I think I'll squeeze this into poseidon-framework/qjanno#4. |
Actually, |
OK, I think I've got them all. |
So, to summarise, I've changed the Genotype_Ploidy where necessary. There is one issue with one of the 1000 Genomes samples, which has heterozygotes in the genotype data, even though it should be pseudo-haploid pulldown. We need to check whether that's an error that came from the AADR. I also fixed a lot of modern samples that had @AyGhal do you want to approve this quickly? Because our release of trident-1.4.0.0 and in particular xerxes-1.0.0.0 rely on this change, I'd appreciate quick feedback, otherwise I will take the risk and merge this in myself. No pressure. |
A new feature in validate in version 1.4.0.0 checks whether Genotype_Ploidy from the Janno file is consistent with genotype data.
When running this on the community-archive, I noticed that many samples had indeed heterozygotes, even though being marked as haploid in the Janno. I have updated the respective packages. There could be some more cases, as I have only run the usual first 100 SNPs, but I think I should have caught most.