Add package 2022_Lazaridis_SouthernArk #209

93Boy · 2024-08-23T07:51:21Z

PR Checklist for a new package submission

The package does not exist already in the community archive, also not with a different name.
The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
The package is stored in a directory that is named like the package title.

The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
The .janno file does not include any empty columns or columns only filled with n/a.
The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.
The .janno and the .ssf files are not fully quoted, so they only use single- or double quotes ("...", '...') to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).

The package passes a validation with trident validate --fullGeno.

Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

github.com:poseidon-framework/community-archive into 2022_Lazaridis_SouthernArk Updating branch

stschiff · 2024-08-23T13:41:18Z

Just a quick comment: The ind file has >5000 samples, the janna >700. They must have the same numbers!

stschiff · 2024-09-02T08:24:10Z

Have you checked this, @93Boy ?

93Boy · 2024-09-03T19:28:17Z

I have gone through the paper and the supplementary materials. The publication also has 778 entries. Therefore I compared genotype data with supplementary data. I have attached the analysis herewith, can you kindly check this? I have checked around 30 random IDs to check whether there is a match in our Poseidon database. I didn't get a positive match for them as well
SouthernArc_mismatches.csv

stschiff · 2024-09-04T05:38:13Z

I think you've taken in genotype data which contains way more data than just the newly published individuals from within that study! You will have to either extract the correct individuals from the AADR, or if you want to use the package provided on David's website, you will have to extract the correct individuals from there.

93Boy · 2024-09-04T12:24:36Z

I have directly downloaded the genotype data available on the Reich lab website. So I will filter out the rest

93Boy · 2024-09-12T21:18:36Z

I have tried to use trident-forge on the genotype data available on reich lab website, but it failed as multiple IDs were not present in the genotype data, Then I tried to forge them from AADR v54 but it also threw the same error. May I manually remove those entries from my Janno file ? These are the IDs that do not available on the genotype data

stschiff · 2024-09-23T14:52:00Z

Has there been an update on this one? I think we discussed that merging isn't necessary, right? You can just extract the data from the AADR.

93Boy · 2024-09-23T21:02:41Z

Hello Stephan , Sorry for the late response. I have encountered another mismatch when fetching the data from Poseidon AADR. It gave me 1566 entries. It seems like almost all the entries have duplicated considering the source data. Herewith attached a small analysis regarding the duplicate values and the number of occurrences. Can you tell me what should I do with these duplicate values? There is another mismatch with the publication and AADR V54.1 which contain 736 unique IDs, When I remove the duplicate of Poseidon AADR file it results 786 unique values
duplicate_values.txt

stschiff · 2024-09-24T09:31:44Z

OK I see. OK I will have to take a look at this, which will not happen this week.

stschiff · 2024-10-08T07:17:01Z

@AyGhal will take a look at this one.

stschiff · 2024-12-03T09:33:55Z

@AyGhal any update?

93Boy added 5 commits August 22, 2024 13:37

genotype data uploaded

f0b5da6

initial janno

b3c31f2

genotype data uploaded

ab0c5a9

initial janno

148f02a

Merge branch '2022_Lazaridis_SouthernArk' of

df41f61

github.com:poseidon-framework/community-archive into 2022_Lazaridis_SouthernArk Updating branch

nevrome changed the title ~~2022 lazaridis southern ark~~ Add package 2022_Lazaridis_SouthernArk Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add package 2022_Lazaridis_SouthernArk #209

Add package 2022_Lazaridis_SouthernArk #209

93Boy commented Aug 23, 2024

stschiff commented Aug 23, 2024

stschiff commented Sep 2, 2024

93Boy commented Sep 3, 2024 •

edited

Loading

stschiff commented Sep 4, 2024

93Boy commented Sep 4, 2024

93Boy commented Sep 12, 2024

stschiff commented Sep 23, 2024

93Boy commented Sep 23, 2024 •

edited

Loading

stschiff commented Sep 24, 2024

stschiff commented Oct 8, 2024

stschiff commented Dec 3, 2024

Add package 2022_Lazaridis_SouthernArk #209

Are you sure you want to change the base?

Add package 2022_Lazaridis_SouthernArk #209

Conversation

93Boy commented Aug 23, 2024

PR Checklist for a new package submission

stschiff commented Aug 23, 2024

stschiff commented Sep 2, 2024

93Boy commented Sep 3, 2024 • edited Loading

stschiff commented Sep 4, 2024

93Boy commented Sep 4, 2024

93Boy commented Sep 12, 2024

stschiff commented Sep 23, 2024

93Boy commented Sep 23, 2024 • edited Loading

stschiff commented Sep 24, 2024

stschiff commented Oct 8, 2024

stschiff commented Dec 3, 2024

93Boy commented Sep 3, 2024 •

edited

Loading

93Boy commented Sep 23, 2024 •

edited

Loading