Skip to content

Count file

Keiran Raine edited this page Jan 24, 2022 · 3 revisions

The counts file (*.counts.tsv.gz) is based around the minimal guide library file as indicated here.

... indicates data has been truncated.

Basic format

Core fields plus:

  • unique_guide - 0/1 indicates if guide is unique.
  • reads_SAMPLE - SAMPLE replaced with value provided during execution or from header of BAM/CRAM.
##Command: pycroquet single-guide -g ...
##Version: 1.3.0
#id	sgrna_ids	sgrna_seqs	gene_pair_id	unique_guide	reads_SAMPLE
...
11023	ACAP3_CCDS19.2_ex10_1:1233213-1233236:+_5-3	CTGTCAGGGCTCTCGCGGT	ACAP3	1	1
...

Merged format

The merged format extends this further. reads_SAMPLE becomes the sum of the input counts.

Each count input file adds a new numbered meta-data header line (Count-col-#N) incorporating:

  • md5 of input file
  • original command from input file header
  • version from input file header

For header item a corresponding numbered column follows reads_SAMPLE with the original counts from the input files:

##Command: pycroquet merge-counts -o ...
##Version: 1.3.0
##Count-col-#1: md5: acdc800d36e38641995137678a9727c1; Version: 1.3.0; Command: pycroquet single-guide -g ...
##Count-col-#2: md5: 8ea9dce29a685e3f1db0bb8a44da9853; Version: 1.3.0; Command: pycroquet single-guide -g ...
#id	sgrna_ids	sgrna_seqs	gene_pair_id	unique_guide	reads_SAMPLE	1	2
...
11023	ACAP3_CCDS19.2_ex10_1:1233213-1233236:+_5-3	CTGTCAGGGCTCTCGCGGT	ACAP3	1	2	1	1
...
Clone this wiki locally