Ploidy fixes #135

stschiff · 2023-09-14T11:08:54Z

A new feature in validate in version 1.4.0.0 checks whether Genotype_Ploidy from the Janno file is consistent with genotype data.

When running this on the community-archive, I noticed that many samples had indeed heterozygotes, even though being marked as haploid in the Janno. I have updated the respective packages. There could be some more cases, as I have only run the usual first 100 SNPs, but I think I should have caught most.

Removed ploidy information from sample HG01804.SG in 2015_1000Genomes_1240K_haploid_pulldown

nevrome · 2023-09-20T13:25:47Z

Brilliant! I love it when the validation helps to make the data better.

I also ran validate with --fullGeno on the entire dataset and did not find any more packages with this issue. You seemed to have caught all of them by looking at the first 100 SNPs.

Why did you also remove/set to n/a the Capture_Type column for some of the affected .janno files?

stschiff · 2023-09-20T16:34:59Z

Thanks, good to know that --fullGeno also didn't report more.

Well the thing with the Capture_Type just came up: For modern samples we don't use Capture, so that should be set to n/a. I don't want to make this a general rule, though, as - in principle - one could sequence a modern-day genome with Capture. It's just that it typically isn't done.

nevrome · 2023-09-20T16:36:45Z

Good catch! Did you search for that systematically as well? Or just fixed it when it came up together with the ploidy issue?

stschiff · 2023-09-20T16:39:21Z

Hmm, I think I did not, no. Not sure exactly now.

nevrome · 2023-09-20T17:03:56Z

I think this would be the relevant query, right?

qjanno "SELECT Poseidon_ID,Capture_Type FROM d(.) WHERE Date_Type = 'modern' AND Capture_Type IS NOT NULL"

Which reminds me that qjanno should add a column with the path to the .janno file a given sample is coming from. Or even the name of the package. The R package already has a feature like this. This would make it much more easy to determine which packages are affected here. I think I'll squeeze this into poseidon-framework/qjanno#4.

stschiff · 2023-09-25T13:47:26Z

Actually, Shotgun is OK for modern data, e.g. in case of 1000Genomes data. I will check again which packages need updating in that respect

stschiff · 2023-09-25T14:11:14Z

OK, I think I've got them all.

stschiff · 2023-09-25T14:18:09Z

So, to summarise, I've changed the Genotype_Ploidy where necessary. There is one issue with one of the 1000 Genomes samples, which has heterozygotes in the genotype data, even though it should be pseudo-haploid pulldown. We need to check whether that's an error that came from the AADR.

I also fixed a lot of modern samples that had Capture_Type set to OtherCapture, which in most cases should be n/a (because there is no capture).

@AyGhal do you want to approve this quickly? Because our release of trident-1.4.0.0 and in particular xerxes-1.0.0.0 rely on this change, I'd appreciate quick feedback, otherwise I will take the risk and merge this in myself. No pressure.

stschiff and others added 8 commits September 14, 2023 10:57

fixed ploidy in Patterson2012

fb2c2f7

updated ploidy in Lazaridis 2014

0c81458

bugfix janno

9e3dbeb

fixed ploidy in Lazaridis 2016

fc584ba

updated Ploidy in 2019_Flegontov

23d3292

updated more modern samples as diploid.

aa744db

Removed ploidy information from sample HG01804.SG in 2015_1000Genomes_1240K_haploid_pulldown

updated checksums

5dbc997

fixed ploidy for HG01804.SG

3d45987

nevrome mentioned this pull request Sep 25, 2023

Better .janno discovery options poseidon-framework/qjanno#4

Merged

stschiff added 2 commits September 25, 2023 15:45

reverted back 1000G yaml file. Nothing changed

02f10ae

removed 1000G changelog

207859f

stschiff added 5 commits September 25, 2023 15:51

updated HG01804.SG to diploid due to hets. See note in janno

6616a54

fixed capture typ in 2014_LazaridisNature

6c96603

removed Capture_Type from 2012_PattersonGenetics

59eca66

fixed Capture_Type in 2016_LazaridisNature

364065e

fixed CaptureType in 2019_BiaginiSpain

2f2157d

stschiff merged commit 6c49c1c into master Sep 26, 2023

stschiff deleted the ploidy_fixes branch September 26, 2023 08:30

nevrome mentioned this pull request Sep 26, 2023

Breaking changes and maintaining reproducibility poseidon-framework/poseidon-hs#276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ploidy fixes #135

Ploidy fixes #135

stschiff commented Sep 14, 2023

nevrome commented Sep 20, 2023

stschiff commented Sep 20, 2023

nevrome commented Sep 20, 2023

stschiff commented Sep 20, 2023

nevrome commented Sep 20, 2023 •

edited

Loading

stschiff commented Sep 25, 2023

stschiff commented Sep 25, 2023

stschiff commented Sep 25, 2023

Ploidy fixes #135

Ploidy fixes #135

Conversation

stschiff commented Sep 14, 2023

nevrome commented Sep 20, 2023

stschiff commented Sep 20, 2023

nevrome commented Sep 20, 2023

stschiff commented Sep 20, 2023

nevrome commented Sep 20, 2023 • edited Loading

stschiff commented Sep 25, 2023

stschiff commented Sep 25, 2023

stschiff commented Sep 25, 2023

nevrome commented Sep 20, 2023 •

edited

Loading