Skip to content

Commit

Permalink
Contribution instructions, fix MPXV Erasmus URLs
Browse files Browse the repository at this point in the history
  • Loading branch information
bede committed Aug 23, 2024
1 parent 13fce2a commit 6a31e30
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 18 deletions.
36 changes: 28 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,48 @@

**🚨 Migration to v1 scheme specification in progress**

A versioned and schematised community repository of tiled amplicon primer scheme definitions (created with e.g. [Primal Scheme](https://primalscheme.com)) for pathogen sequencing, made with the objective of eliminating ambiguity in scheme naming and versioning and maximising the findability, accessibility, interoperability and reusability ([FAIRness](https://www.go-fair.org/fair-principles/)) of primer scheme definitions and associated sequencing data.
A versioned and schematised community repository of tiled amplicon primer scheme definitions (created with e.g. [Primal Scheme](https://primalscheme.com)) for pathogen sequencing, made with the objective of eliminating ambiguity in scheme naming and versioning and maximising the findability, accessibility, interoperability and reusability ([FAIRness](https://www.go-fair.org/fair-principles/)) of primer schemes and associated sequencing data. An example of a canonical primer scheme name is `sars-cov-2/midnight/1200/v1.0.0`.

The repository includes a top-level machine readable [index](https://github.com/pha4ge/primer-schemes/blob/main/index.yml) of available primer scheme definitions.



## Scheme specification

A scheme definition has three components:

1. A reference sequence ([`reference.fasta`](https://github.com/pha4ge/primer-schemes/blob/main/sars-cov-2/artic/v4.1/reference.fasta))
2. A seven column BED file of primer sequences & positions in reference coordinates ([`primer.bed`](https://github.com/pha4ge/primer-schemes/blob/main/sars-cov-2/artic/v4.1/primer.bed))
3. A metadata file in YAML format adhering to [this schema](https://github.com/pha4ge/primaschema/blob/main/src/primaschema/schema/info.yml) ([`info.yml`](https://github.com/pha4ge/primer-schemes/blob/main/schemes/sars-cov-2/artic/400/v4.1.0/info.yml))
1. A reference sequence (e.g. [`reference.fasta`](https://github.com/pha4ge/primer-schemes/blob/main/schemes/sars-cov-2/artic/400/v4.1.0/reference.fasta))
2. A seven column Primal Scheme-like BED file of primer sequences & coordinates (e.g. [`primer.bed`](https://github.com/pha4ge/primer-schemes/blob/main/schemes/sars-cov-2/artic/400/v4.1.0/primer.bed))
3. A metadata file in YAML format adhering to a [schema](https://github.com/pha4ge/primaschema/blob/main/src/primaschema/schema/info.yml) (e.g. [`info.yml`](https://github.com/pha4ge/primer-schemes/blob/main/schemes/sars-cov-2/artic/400/v4.1.0/info.yml))



## Tooling

## Contributing new schemes
The repository's companion tool [Primaschema](https://github.com/pha4ge/primaschema) is used to automatically validate schemes in this repository, create graphics and manage checksums, as well as generate a six column scheme.bed for legacy tool compatibility. It may be installed standalone using `pip install` for fetching, validating and interrogating primer schemes.

*Coming soon*


## Contributing new scheme definitions

## Tooling
We encourage contributions of any schemes the others might wish to use, especially if sequencing data has been or will be deposited publicly. We're working to make this process easier, but in the meantime please either follow the instructions below to send us a draft scheme, or create a pull request using GitHub if comfortable doing so.

A scheme definition comprises *i)* a reference sequence (`reference.fasta`), *ii)* a BED file of primer sequences & reference coordinates (`primer.bed`), and *iii)*, a metadata file in YAML format adhering to [this schema](https://github.com/pha4ge/primaschema/blob/main/src/primaschema/schema/info.yml), called `info.yml`. If you've created a scheme you probably already have i) and ii), and need to make `info.yml`. It's easiest to begin by modifying a copy of an existing `info.yml` [such as this one](https://github.com/pha4ge/primer-schemes/blob/main/schemes/sars-cov-2/eden/2500/v1.0.0/info.yml).

The repository's companion tool [Primaschema](https://github.com/pha4ge/primaschema) is used to automatically validate schemes in this repository and create plots, as well as generate a six column scheme.bed for legacy tool compatibility. It can also be installed standalone.
1. Check that the `organism` field in your scheme's `info.yml` references the correct pathogen. If there are no existing schemes for the target pathogen, please [open a GitHub issue](https://github.com/pha4ge/primer-schemes/issues) to request it be added.
2. Choose a scheme name and version, e.g `midnight` and `v1.0.0`. The name should not include special characters except hyphens.
- If adding a new scheme, choose any name, preferably not referencing the organism name.
- If updating your existing scheme, keep the same name and update the version:
- Versions must take the form `v{major}.{minor}.{patch}`
- For primer changes beyond adding primers, increment the *major* version
- If only adding primers with respect to an existing version, increment the *minor* version
- For smaller technical changes, the *patch* version may be incremented
- If updating a third party's existing scheme, you may propose a new scheme name with version `v1.0.0` rather than increment the existing scheme's version.
3. Complete the `name` and `version` fields inside your new scheme's `info.yml`, along with the other required fields:
- `amplicon_size`: the approximate integer amplicon length in bp
- `developers`: a list of developer names or organisations
4. [Open a GitHub issue](https://github.com/pha4ge/primer-schemes/issues) attaching or linking to your `reference.fasta`, `primer.bed` and `info.yml` files.
5. If you wish, you may install [primaschema](https://github.com/pha4ge/primaschema) and run `primaschema build {scheme-directory}`to validate your newly created scheme and add checksums etc. However this is not necessary.



Expand Down
12 changes: 6 additions & 6 deletions index.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@ schemes:
- Matthijs Welkers
- Marcel Jonges
- Anton van den Ouden
definition_url: https://github.com/pha4ge/primer-schemes/tree/main/sars-cov-2/artic/v4.1
source_url: https://github.com/quick-lab/primerschemes/tree/main/primerschemes/artic-sars-cov-2/400/v4.1.0
definition_url: https://github.com/pha4ge/primer-schemes/tree/main/schemes/mpxv/erasmus/2500/v1.0.0
source_url: https://www.protocols.io/view/monkeypox-virus-whole-genome-sequencing-using-comb-n2bvj6155lk5
citations:
- https://www.protocols.io/view/monkeypox-virus-whole-genome-sequencing-using-comb-n2bvj6155lk5
license: CC-BY-SA-4.0
primer_checksum: primaschema:9acd3f5ce5711ed2
reference_checksum: primaschema:31ff1b123bbce197
reference_checksum: primaschema:17cbcde0cfff7158
- organism: mpxv
name: yale
amplicon_size: 2000
Expand Down Expand Up @@ -236,16 +236,16 @@ schemes:
license: CC-BY-SA-4.0
primer_checksum: primaschema:294136d77eb67db8
reference_checksum: primaschema:b1acd7163146bf17
- name: eden
- organism: sars-cov-2
name: eden
amplicon_size: 2500
version: v1.0.0
organism: sars-cov-2
aliases:
- sydney
developers:
- John-Sebastian Eden
- Eby Sim
definition_url: https://github.com/pha4ge/primer-schemes/tree/main/sars-cov-2/eden/v1
definition_url: https://github.com/pha4ge/primer-schemes/blob/main/schemes/sars-cov-2/eden/2500/v1.0.0
citations:
- https://dx.doi.org/10.17504/protocols.io.befyjbpw
notes:
Expand Down
3 changes: 1 addition & 2 deletions schemes/mpxv/erasmus/2500/v1.0.0/info.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@ developers:
- Matthijs Welkers
- Marcel Jonges
- Anton van den Ouden
definition_url: https://github.com/pha4ge/primer-schemes/tree/main/sars-cov-2/artic/v4.1
source_url: https://github.com/quick-lab/primerschemes/tree/main/primerschemes/artic-sars-cov-2/400/v4.1.0
definition_url: https://github.com/pha4ge/primer-schemes/tree/main/schemes/mpxv/erasmus/2500/v1.0.0
citations:
- https://www.protocols.io/view/monkeypox-virus-whole-genome-sequencing-using-comb-n2bvj6155lk5
license: CC-BY-SA-4.0
Expand Down
4 changes: 2 additions & 2 deletions schemes/sars-cov-2/eden/2500/v1.0.0/info.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
schema_version: 1.0.0a
organism: sars-cov-2
name: eden
amplicon_size: 2500
version: v1.0.0
organism: sars-cov-2
aliases:
- sydney
developers:
- John-Sebastian Eden
- Eby Sim
definition_url: https://github.com/pha4ge/primer-schemes/tree/main/sars-cov-2/eden/v1
definition_url: https://github.com/pha4ge/primer-schemes/blob/main/schemes/sars-cov-2/eden/2500/v1.0.0
citations:
- https://dx.doi.org/10.17504/protocols.io.befyjbpw
notes:
Expand Down

0 comments on commit 6a31e30

Please sign in to comment.