Skip to content

machine-readable file format for genomic library sequence and structure

License

Notifications You must be signed in to change notification settings

pachterlab/seqspec

Repository files navigation

seqspec

github version pypi version python versions license

seqspec is a machine-readable YAML file format for genomic library sequence and structure. It was inspired by and builds off of the Teichmann Lab Single Cell Genomics Library Structure by Xi Chen.

Genomic library structure depends on both the assay and sequencer (and kit) used to generate and bind the assay-specific construct to the sequencing adapters to generate a sequencing library. Therefore, a seqspec is specific to both a genomics assay and sequencer.

A list of seqspec examples for multiple assays and sequencers can be found on this website. Each spec.yaml describes the 5'->3' "Final library structure" for the assay and sequencer and can be extended to include sequencer-specific read annotations. Sequence specification files can be formatted with the seqspec command line tool.

The seqspec format and tool are described in this publication. If you use seqspec please cite

Ali Sina Booeshaghi, Xi Chen, Lior Pachter, A machine-readable specification for genomics assays, Bioinformatics, Volume 40, Issue 4, April 2024, btae168.

image

# release
pip install seqspec

# development
pip install git+https://github.com/pachterlab/seqspec.git

# verify install
seqspec --help

Documentation: