Skip to content

Latest commit



164 lines (114 loc) · 5.97 KB

File metadata and controls

164 lines (114 loc) · 5.97 KB

Poseidon v.2: DAG Genotype Data Organisation

Poseidon v.2 is a solution for genotype data organisation established within the Department of Archaeogenetics at the Max Planck Institute for the Science of Human History (MPI-SHH) in Jena.

The Poseidon v.2 package

All ancient and modern data are distributed into so-called packages, which are directories containing a dedicated set of files. Packages correspond to published sets of genomes, or in case of unpublished projects, ongoing (and growing) sets of samples currently analysed. All text files in the package are UTF-8 encoded.


Every package should have the following files:

  • The POSEIDON.yml file
  • The X.janno file
  • The X.bed, X.bim, X.fam files

It also can contain the following files:

  • The README.txt file
  • The CHANGELOG.txt file
  • The LITERATURE.bib file



The POSEIDON.yml file [mandatory]

The POSEIDON.yml file lists metainformation in a standardized, machine-readable format.


poseidonVersion: 2.0.1
title: Schiffels_2016
description: Genetic data published in Schiffels et al. 2016
  - name: Stephan Schiffels
    email: [email protected]
  - name: Paul Panther
    email: [email protected]
lastModified: 2020-02-28
bibFile: LITERATURE.bib
  format: PLINK	
  genoFile: Schiffels_2016.bed	
  snpFile: Schiffels_2016.bim	
  indFile: Schiffels_2016.fam	
jannoFile : Schiffels_2016.janno

The X.janno file [mandatory]

The .janno file is a UTF-8 encoded, tab-separated text file with a header line. It holds a clearly defined set of context information (columns) for each sample (rows) in a package.

  • The variables (columns), variable types and possible content of the janno file are documented in the janno_columns.tsv file in this repository.
  • A .janno file must have all of these columns in exactly this order with exactly these column names.
  • If information is unknown or a variable does not apply for a certain sample, then the respective cell(s) can be filled with the NULL value n/a. Ideally, a .janno file should have the least number of n/a-values possible.
  • The order of the samples (rows) in the .janno file must be equal to the order in the files that hold the genetic data.
  • The values in the columns Individual_ID and Group_Name must be equal to the terms used in the first and second column of the .fam file.
  • Multiple columns of the .janno file are list columns that hold multiple values (either strings or numerics) separated by ;

The X.bed, X.bim, X.fam files [mandatory]

Binary plink genotype files consisting of .bed (PLINK binary biallelic genotype table), .bim (PLINK extended MAP file) and .fam (PLINK sample information).

The README.txt file [optional]

The README.txt file contains arbitrary, human-readable information.


This package contains a rather interesting set of samples. 
@Uebertruplf_2021 even claimed that they are the most important for this particular area and time period.

The CHANGELOG.txt file [optional]

Documentation of important changes in the history of a package.


- 2021_10_01: Fixed a spelling mistake in the site name "Hosenacker"->"Rosenacker". 
- 2021_05_05: The authors of @Gassenhauer_2021 made some previously restricted samples for their publication available later and we added them.
- 2021_03_08: Creation of the package.

The LITERATURE.bib file [optional]

Bibtex file with all references mentioned in POSEIDON.yml, README.txt and CHANGELOG.txt

Naming Poseidon v.2 packages

The naming of packages should follow a simple scheme:

Ancient published: YEAR_NAME_IDENTIFIER


Ancient unpublished: IDENTIFIER_NAME


Modern published: YEAR_(NAME)_IDENTIFIER


Modern unpublished: IDENTIFIER_NAME


Identifiers can be somewhat informal as long as the project is ongoing, they just need to be unique. As soon as a project gets published, we create a final version of the respective package with the YEAR_NAME_IDENTIFIER label.

External projects can be integrated similarly by using their publication name, or by temporary internal identifiers such as Iron_Age_Boston_Share.

DAG internal procedures

Individual contributors would create packages in dedicated poseidon folders in their user project directories, e.g. /project1/user/xyz/poseidon/2018_Lamnidis_Fennoscandia. That way, subfolders belong to individual maintainers and be writable only by them.

The poseidon admins would then link these packages into the official /projects1/poseidon repo, located on the HPC storage unit of the MPI-SHH, where we distinguish ancient and modern genotype data:
