-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add package 2022_Lazaridis_SouthernArk #209
base: master
Are you sure you want to change the base?
Conversation
github.com:poseidon-framework/community-archive into 2022_Lazaridis_SouthernArk Updating branch
Just a quick comment: The ind file has >5000 samples, the janna >700. They must have the same numbers! |
Have you checked this, @93Boy ? |
I have gone through the paper and the supplementary materials. The publication also has 778 entries. Therefore I compared genotype data with supplementary data. I have attached the analysis herewith, can you kindly check this? I have checked around 30 random IDs to check whether there is a match in our Poseidon database. I didn't get a positive match for them as well |
I think you've taken in genotype data which contains way more data than just the newly published individuals from within that study! You will have to either extract the correct individuals from the AADR, or if you want to use the package provided on David's website, you will have to extract the correct individuals from there. |
I have directly downloaded the genotype data available on the Reich lab website. So I will filter out the rest |
Has there been an update on this one? I think we discussed that merging isn't necessary, right? You can just extract the data from the AADR. |
Hello Stephan , Sorry for the late response. I have encountered another mismatch when fetching the data from Poseidon AADR. It gave me 1566 entries. It seems like almost all the entries have duplicated considering the source data. Herewith attached a small analysis regarding the duplicate values and the number of occurrences. Can you tell me what should I do with these duplicate values? There is another mismatch with the publication and AADR V54.1 which contain 736 unique IDs, When I remove the duplicate of Poseidon AADR file it results 786 unique values |
OK I see. OK I will have to take a look at this, which will not happen this week. |
@AyGhal will take a look at this one. |
@AyGhal any update? |
PR Checklist for a new package submission
POSEIDON.yml
conforms to the general title structure suggested here:<Year>_<Last name of first author>_<Region, time period or special feature of the paper>
, e.g.2021_Zegarac_SoutheasternEurope
,2021_SeguinOrlando_BellBeaker
or2021_Kivisild_MedievalEstonia
.POSEIDON.yml
file with not just the file-referencing fields, but also the following meta-information fields present and filled:poseidonVersion
,title
,description
,contributor
,packageVersion
,lastModified
(see here for their definition).janno
file (for a list of available fields look here and here for more detailed documentation about them)..bib
file with the necessary literature references for each sample in the.janno
file.POSEIDON.yml
file and there are no additional, supplementary files in the submission that are not documented there..janno
and.bib
file are all named after the package title and only differ in the file extension.POSEIDON.yml
file is1.0.0
.poseidonVersion
of the package in thePOSEIDON.yml
file is set to the latest version of the Poseidon schema.POSEIDON.yml
file contains the corresponding checksums for the fieldsgenoFile
,snpFile
,indFile
,jannoFile
andbibFile
.CHANGELOG
file or one with a single entry for version1.0.0
.Publication
column in the.janno
file is filled and the respective.bib
file has complete entries for the listed mentioned keys..janno
file does not include any empty columns or columns only filled withn/a
..janno
file adheres to the standard order as defined in the Poseidon schema here..janno
and the.ssf
files are not fully quoted, so they only use single- or double quotes ("..."
,'...'
) to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).trident validate --fullGeno
.git lfs migrate import --no-rewrite path/to/file.bed
(see here).