Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new package 2022_GnecchiRuscone_CarpathianBasin prepared by Guido. #140

Merged
merged 8 commits into from
Nov 10, 2023

Conversation

AyGhal
Copy link
Contributor

@AyGhal AyGhal commented Oct 19, 2023

Guido kindly prepared the Poseidon package for his recent publication.

@AyGhal
Copy link
Contributor Author

AyGhal commented Oct 19, 2023

I used trident validate to check the package, but maybe someone can also take a quick look. Thanks!

@stschiff
Copy link
Member

stschiff commented Oct 20, 2023

Mentioning @gagr88

Copy link
Member

@stschiff stschiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, thank you for the submission @AyGhal and @gagr88.

Just a few requests:

  • we would prefer genotype files to be named according to the package name. I suggest to rename them.
  • we require PLINK-formatted genotype data. Please use trident genoconvert to convert to Plink.
  • The package description in the YML file is missing.

@nevrome
Copy link
Member

nevrome commented Oct 24, 2023

@stschiff, @TCLamnidis, @93Boy
As requested in #141 I have prepared a checklist for new package submissions. @TCLamnidis' work for the minotaur recipe repo was a good inspiration 👍. Every new pull request will now include this template.

@AyGhal had a look to confirm that it generally fits the requirements, but I want to give it a real world test. So that's why I will now go through @gagr88's package here and check the boxes as if I were him.

PR Checklist for a new package submission

General package properties

  • The package does not exist already in the community archive, also not with a different name.
  • The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
  • The package is stored in a directory that is named like the package title.

  • The package is complete and features the following elements:
    • Genotype data in binary PLINK format (not EIGENSTRAT format).
    • A POSEIDON.yml file with not just the file-referencing fields, but also the following meta-information fields present and filled: poseidonVersion, title, description, contributor, packageVersion, lastModified (see here for their definition)
    • A reasonably filled .janno file (for a list of available fields look here and here for more detailed documentation about them).
    • A .bib file with the necessary literature references for each sample in the .janno file.
  • Every file in the submission is correctly referenced in the POSEIDON.yml file and there are no additional, supplementary files in the submission that are not documented there.
  • Genotype data, .janno and .bib file are all named after the package title and only differ in the file extension.
  • All text files are UTF-8 encoded and have Unix/Unix-like line endings (LF, not CR+LF or CR).
  • The package version in the POSEIDON.yml file is 1.0.0.
  • The poseidonVersion of the package in the POSEIDON.yml file is set to the latest version of the Poseidon schema.
  • The POSEIDON.yml file contains the corresponding checksums for the fields genoFile, snpFile, indFile, jannoFile and bibFile.
  • There is either no CHANGELOG file or one with a single entry for version 1.0.0.

  • The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
  • The .janno file does not include any empty columns or columns only filled with n/a.
  • The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.

  • The package passes a validation with trident validate --fullGeno.

  • Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here.

@TCLamnidis
Copy link
Member

In the future, we can move some of these validation steps from checkboxes to actual GA checks, to enforce them.
It would be particularly straight-forward to check that POSEIDON.yml attributes do not match the default values via a GA, e.g. title, description, contributor, packageVersion, lastModified.

Copy link
Member

@TCLamnidis TCLamnidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the janno, there are multiple sites that include hyphens. Sometimes these have spaces before, after, in both, or neither. I recognise this is wildly pedantc, but it would be good to have a consistent system perhaps? at least within package

@nevrome
Copy link
Member

nevrome commented Oct 24, 2023

@TCLamnidis Right - more automatic validation is always possible.

The biggest problem I saw is that the large genotype data files are not properly committed with Git LFS. I understand that this is a pain, because you have to set it up properly to get it right and Git does not complain if you do it wrong. Not even here on GitHub it's pointed out easily. I only realized it when I checkout Ayshin's branch and Git LFS raised an issue. I'll try some things later to see if we at least can get an error here on GitHub.

I'll also check how you can fix this now, @AyGhal.


Edit: This is resolved now. We added a GitHub action check and documented a way to fix the issue if it occurs.

@stschiff
Copy link
Member

I can fix the outstanding checkbox point.

@nevrome
Copy link
Member

nevrome commented Nov 7, 2023

I think you can not easily fix this @stschiff, because @AyGhal proposed this change from her own fork. I suggest we just merge now without further delay. A couple of empty columns should not delay us further 👍

@stschiff
Copy link
Member

Actually I can, as Ayshin has allowed her branch for commits. Let me quickly have a look.

@stschiff
Copy link
Member

OK, I made the changes and removed the empty columns from the Janno file, but sadly I cannot push to your repo, Ayshin, because I lack that rights. I actually don't know how this works with HTTPS... I guess I need a token. So I'm just gonna send you the modified Janno via mattermost, so you can simply replace the old one, and then also update the checksum. Sorry for the hazzle, indeed would've been easier to just merge this. In fact, you still can and then I'm pushing this to master.

@nevrome nevrome merged commit 885574c into poseidon-framework:master Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants