-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 2021 posth sci adv #214
Conversation
…th_Etruscan.janno
Create 2021_Posth_SciAdv.janno
OK, so this adds Posth et al. 2021 Science Advances. The Janno-File was kindly prepared by @EleniSef, and the genotype data provided by the authors themselves. I fixed alignment issues between the Janno and the genotype data (removed some samples from Elenis Janno and brought the genotype data into the right order). There are some issues left to fix, in particular with the Janno. Thanks a lot! |
@stschiff thank you for pointing that out. |
Great, I've added @EleniSef's new Janno file, and fixed some other minor things. I think the package is ready for review. |
I have no idea why the validation check fails. This package validates on my Computer (MacOS). The janno-checksum is correct. Any hint, @nevrome ? |
Maybe it's again this obscure issue? poseidon-framework/poseidon-hs#302 |
Uff, it probably is... which means that I should rerun |
Hmm... no. On our server I get the exact same checksum. I have no idea how to proceed. Can someone perhaps pull this branch and run validate on their linux machine? |
On my system the validation fails because of a wrong checksum for I don't know where this discrepancy is coming from. I was convinced this must be an issue of your proprietary operating system, but if you observed the same behaviour on a proper (Linux) computer, then I'm as lost as you. Could this be something about Git and autocrlf? We have set it to community-archive/.gitattributes Line 2 in a8d321b
|
Yes, that was it. The Janno-file had CRLF line-endings. Perhaps because I used @EleniSef's file as as a start, and she used Windows? On Mac, default line endings are LF. I wonder whether we can have With respect to git-attributes, I wonder why we need this attribute-setting. It seems to me that it breaks the consistency of our checksum-system. If a package validates locally, I cannot trust anymore that it validates on another computer. That's bad, right? I will open another issue just for that. I fixed the Janno and the checksum, so this is good to review. Thanks. |
OK, hold on, there is another issue that our check found: We seem to have a duplicate individual with a sample in 2021 Biagini et al:
@EleniSef if you have any input here, that'd be welcome, but I will also have a look at the other paper myself. Would be good to know whether that is really the same sample, or just a coincidence, in which case we need to rename it. |
For your comment above, I used google spreadsheet to prepare the .janno file, as I could easily convert to tsv. I am not sure if that was the problem. Here you can also validate this: |
OK wonderful, thanks for investigating. OK, then we need to find a new Poseidon_ID for MAS003 in Posth et al. I would the suggest to rename all
Would that be acceptable, you think? Perhaps @AyGhal can briefly comment? |
That works and we can have the original IDs in the |
Hmm, good point. Calling in @TCLamnidis @wolfgangaroo and @nevrome -> Can you remember how we wanted to deal with Pandora IDs in publications in cases where there already is an ID from another lab? As you see above, I propose to rename four individuals from Posth et al. 2021 with explicit site names (so |
I think the last time we extended the name (GOR/Gordion?) , so I would vote for |
OK, this is now good for review and merge, @AyGhal. |
Okay, I see that you have changed the |
Good catch. I just added the original names as |
Looks good to me, thanks! |
PR Checklist for a new package submission
POSEIDON.yml
conforms to the general title structure suggested here:<Year>_<Last name of first author>_<Region, time period or special feature of the paper>
, e.g.2021_Zegarac_SoutheasternEurope
,2021_SeguinOrlando_BellBeaker
or2021_Kivisild_MedievalEstonia
.POSEIDON.yml
file with not just the file-referencing fields, but also the following meta-information fields present and filled:poseidonVersion
,title
,description
,contributor
,packageVersion
,lastModified
(see here for their definition).janno
file (for a list of available fields look here and here for more detailed documentation about them)..bib
file with the necessary literature references for each sample in the.janno
file.POSEIDON.yml
file and there are no additional, supplementary files in the submission that are not documented there..janno
and.bib
file are all named after the package title and only differ in the file extension.POSEIDON.yml
file is1.0.0
.poseidonVersion
of the package in thePOSEIDON.yml
file is set to the latest version of the Poseidon schema.POSEIDON.yml
file contains the corresponding checksums for the fieldsgenoFile
,snpFile
,indFile
,jannoFile
andbibFile
.CHANGELOG
file or one with a single entry for version1.0.0
.Publication
column in the.janno
file is filled and the respective.bib
file has complete entries for the listed mentioned keys..janno
file does not include any empty columns or columns only filled withn/a
..janno
file adheres to the standard order as defined in the Poseidon schema here..janno
and the.ssf
files are not fully quoted, so they only use single- or double quotes ("..."
,'...'
) to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).trident validate --fullGeno
.git lfs migrate import --no-rewrite path/to/file.bed
(see here).