Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data from Peltola et al. 2023: Genetic admixture and language shift in the medieval Volga-Oka interfluve #151

Merged
merged 7 commits into from
Feb 8, 2024

Conversation

smpeltola
Copy link
Contributor

@smpeltola smpeltola commented Dec 21, 2023

PR Checklist for a new package submission

  • The package does not exist already in the community archive, also not with a different name.
  • The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
  • The package is stored in a directory that is named like the package title.

  • The package is complete and features the following elements:
    • Genotype data in binary PLINK format (not EIGENSTRAT format).
    • A POSEIDON.yml file with not just the file-referencing fields, but also the following meta-information fields present and filled: poseidonVersion, title, description, contributor, packageVersion, lastModified (see here for their definition)
    • A reasonably filled .janno file (for a list of available fields look here and here for more detailed documentation about them).
    • A .bib file with the necessary literature references for each sample in the .janno file.
  • Every file in the submission is correctly referenced in the POSEIDON.yml file and there are no additional, supplementary files in the submission that are not documented there.
  • Genotype data, .janno and .bib file are all named after the package title and only differ in the file extension.
  • The package version in the POSEIDON.yml file is 1.0.0.
  • The poseidonVersion of the package in the POSEIDON.yml file is set to the latest version of the Poseidon schema.
  • The POSEIDON.yml file contains the corresponding checksums for the fields genoFile, snpFile, indFile, jannoFile and bibFile.
  • There is either no CHANGELOG file or one with a single entry for version 1.0.0.

  • The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
  • The .janno file does not include any empty columns or columns only filled with n/a.
  • The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.

  • The package passes a validation with trident validate --fullGeno.

  • Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

@nevrome
Copy link
Member

nevrome commented Jan 2, 2024

Thank you for your submission, @smpeltola!

Could you please go through the checklist above and confirm that all requirements for the submission are met? You can click the checkboxes on the left to confirm an item.

The automatic validation shows that your package includes duplicates of samples already published elsewhere: https://github.com/poseidon-framework/community-archive/actions/runs/7287175995/job/20086238835?pr=151

[Error]   [2024-01-02 09:12:30] There are duplicated individuals in this package collection. Set --ignoreDuplicates to ignore this issue.
[Error]   [2024-01-02 09:12:30] Duplicate individual "BOL001"
[Error]   [2024-01-02 09:12:30]   IndividualInfo {indInfoName = "BOL001", indInfoGroups = ["Fatyanovo"], indInfoPac = *2021_Saag_EastEuropean-3.2.0*}
[Error]   [2024-01-02 09:12:30]   IndividualInfo {indInfoName = "BOL001", indInfoGroups = ["VolgaOka_IA"], indInfoPac = *2023_Peltola_VolgaOka-1.0.0*}
[Error]   [2024-01-02 09:12:30] Duplicate individual "BOL002"
[Error]   [2024-01-02 09:12:30]   IndividualInfo {indInfoName = "BOL002", indInfoGroups = ["Fatyanovo"], indInfoPac = *2021_Saag_EastEuropean-3.2.0*}
[Error]   [2024-01-02 09:12:30]   IndividualInfo {indInfoName = "BOL002", indInfoGroups = ["VolgaOka_IA"], indInfoPac = *2023_Peltola_VolgaOka-1.0.0*}
[Error]   [2024-01-02 09:12:30] Duplicate individual "BOL003"
[Error]   [2024-01-02 09:12:30]   IndividualInfo {indInfoName = "BOL003", indInfoGroups = ["Fatyanovo"], indInfoPac = *2021_Saag_EastEuropean-3.2.0*}
[Error]   [2024-01-02 09:12:30]   IndividualInfo {indInfoName = "BOL003", indInfoGroups = ["VolgaOka_IA"], indInfoPac = *2023_Peltola_VolgaOka-1.0.0*}
[Error]   [2024-01-02 09:12:30] The package collection is broken: Detected duplicate individuals.

In our public archives we enforce unique Poseidon_IDs, so duplicates in new packages must be renamed. Are these samples from different individuals than the ones in 2021_Saag_EastEuropean? Or are they from the same and just reprocessed versions?

@smpeltola
Copy link
Contributor Author

Hi @nevrome,

These are not the same individuals as in Saag 2021, they just happen to have the same ID. I'll come up with unique IDs and try again.

Unique names for BOL001-003
Unique names for BOL001-003
@stschiff
Copy link
Member

stschiff commented Jan 9, 2024

Thanks for the submission @smpeltola, looking forward to getting those in. The ID duplication is annoying, but since we never made a resolution with the Tartu lab it's first-come-first-serve for any specific ID.

@stschiff
Copy link
Member

Hmm, @nevrome, now the validation failed but I don't think I understand the error. Can you have a look?

@nevrome
Copy link
Member

nevrome commented Jan 17, 2024

The error reads:

2024-01-17T12:56:58.1302602Z pointer: unexpectedGitObject: "2023_Peltola_VolgaOka/2023_Peltola_VolgaOka.bim" (treeish cc62b0063f6ee9b6828160f14344411fcd11cc8b) should have been a pointer but was not
2024-01-17T12:56:58.1308279Z pointer: unexpectedGitObject: "2023_Peltola_VolgaOka/2023_Peltola_VolgaOka.bed" (treeish cc62b0063f6ee9b6828160f14344411fcd11cc8b) should have been a pointer but was not

I understand this message as follows: The two large data files (.bim and .bed) were not correctly committed as Git LFS files, but as normal, Git-tracked files. Unfortunately this is not sufficient; as explained here.

To fix this, the files have to be migrated to Git LFS. Please look here for instructions on how to install Git LFS. Then activate it in the repository with git lfs install. When this is done, the two files can be transferred with git lfs migrate import --no-rewrite path/to/file.bed.

@nevrome
Copy link
Member

nevrome commented Jan 22, 2024

This looks now good to me - I think it can be merged. Thank you for preparing this, @smpeltola, and for addressing the minor technical issues!

What's your final verdict, @AyGhal?

Copy link
Member

@stschiff stschiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, great!

@TCLamnidis
Copy link
Member

Would be awesome to also include an SSF file to this package.

@nevrome
Copy link
Member

nevrome commented Feb 5, 2024

I don't think this should hold this up now. A .ssf file is purely optional and can easily be added later.

@AyGhal Do you still have some concerns about this package? Otherwise I think it can be merged.

@smpeltola
Copy link
Contributor Author

I'm happy to add an SSF file later on, but don't hold up the merge for it.

@AyGhal
Copy link
Contributor

AyGhal commented Feb 8, 2024

Thank you @smpeltola for making the package! And sorry that it took me so long to get back to it. It looks great.

@AyGhal AyGhal merged commit 5d652f0 into poseidon-framework:master Feb 8, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants