A toolkit to parse seqspec yaml
files for processing of single cell sequencing data.
mamba create --name scarecrow python=3.12
mamba activate scarecrow
mamba install git pip
pip install git+https://github.com/pachterlab/seqspec.git
pip install git+https://github.com/MorganResearchLab/scarecrow.git
The fastq files must match the read_id
in the spec.yaml. A modified file is provided with scarecrow (./scarecrow/specs/splitseq/splitseq.yaml
), however the read_id
requires updating on a user-by-user basis if the fastq files are not stored in a shared space.
git clone https://github.com/cellatlas/cellatlas.git # Cloned for SPLiTSeq example fastq files
git clone https://github.com/pachterlab/seqspec.git # Cloned for SPLiTSeq v0.3.0 spec.yaml
scarecrow --help
scarecrow extract ./scarecrow/specs/splitseq/splitseq.yaml \
/Users/s14dw4/Documents/Repos/cellatlas/examples/rna-splitseq/fastqs/R1.fastq.gz \
/Users/s14dw4/Documents/Repos/cellatlas/examples/rna-splitseq/fastqs/R2.fastq.gz \
-o ./cDNA.fq -r UMI Round_1_BC Round_2_BC Round_3_BC
Expected console output:
seqspec print ./scarecrow/specs/splitseq/splitseq.yaml
┌─'P5:29'
├─'Spacer:8'
├─'Read_1_primer:33'
├─'cDNA:100'
├─'RT_primer:15'
├─'Round_1_BC:8'
├─'linker_1:30'
──────────────────── ──rna────────────────┤
├─'Round_2_BC:8'
├─'Linker_2:30'
├─'Round_3_BC:8'
├─'UMI:10'
├─'Read_2_primer:22'
├─'Round_4_BC:6'
└─'P7:24'
Library elements identified by seqspec.get_index_by_primer
/Users/s14dw4/Documents/Repos/cellatlas/examples/rna-splitseq/fastqs/R1.fastq.gz
cDNA: 0-100
RT_primer: 100-115
Round_1_BC: 115-123
linker_1: 123-140
/Users/s14dw4/Documents/Repos/cellatlas/examples/rna-splitseq/fastqs/R2.fastq.gz
UMI: 0-10
Round_3_BC: 10-18
Linker_2: 18-48
Round_2_BC: 48-56
linker_1: 56-86
Processing cDNA and writing to ./cDNA.fq
Processing Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:05<00:00, 187.04it/s]
Total read pairs processed: 1000000
Barcode Distribution
Bin Ranges (Read Counts) | Histogram
1-2 | #################### (740165)
2-4 | ## (82638)
4-8 | (6911)
8-16 | (2030)
16-31 | (423)
31-62 | (60)
62-124 | (40)
124-247 | (47)
247-491 | (15)
491-977 | (1)
Total unique barcodes: 832330
Min count: 1
Max count: 977
kallisto bus -x 0,115,123,1,10,18,1,48,56:1,0,10:0,0,100
Output files should include cDNA.fq
, cDNA.fq.barcode_counts.csv
, and scarecrow.log
. The barcode counts in cDNA.fq.barcode_counts.csv
show incorrect barcodes (top 5 listed below):
umi_barcodes,Count
AGATTCGTCA_ACGATCAG_GCTACGCT_CCATAGTT,977
CATTCCTAGG_AAAAAAAA_GCTACGCT_TGTCAGAG,461
CGTTACTAGG_AAAAAAAA_GCTACGCT_TGTCAGAG,434
TTTAACCCGG_AAAAAAAA_GCTACGCT_TGTCAGAG,387
CGTACTCTGG_AAAAAAAA_GCTACGCT_TGTCAGAG,360
The barcode whitelists for the seqspec
SPLiT-seq example are available in the repo (./seqspec/examples/specs/SPLiT-seq/
).
To check if barcode sequences from a whitelist are present in a fastq file.
scarecrow barcode_check ./seqspec/examples/specs/SPLiT-seq/onlist_round1.txt \
./cellatlas/examples/rna-splitseq/fastqs/R1.fastq.gz\
./Round_1_BC_counts.txt
This reports matches for all barcodes, but across a range of positions within the sequencing read rather than at the expected position (115) reported by seqspec above. The first barcode in the counts file is as follows:
Barcode Count Direction Positions
TTAGGCAT 3553 forward F:116,F:107,F:120,F:29,R:129,F:72,R:129,R:61,F:47,F:116,R:99,R:123,R:61,R:128,F:4,R:126,F:103,R:121,F:12,F:123,R:68,R:33,F:62,R:121,F:63,F:82,R:106,R:69,F:62,R:66,R:55,F:73,R:7,R:56,R:119,R:24,F:89,R:71,F:84,F:40,F:28,R:109,F:107,R:74,R:59,F:104,R:108,F:124,R:117,R:3,R:92,R:66,F:19,R:66,F:12,R:122,F:119,R:91,R:59,R:108,F:84,F:106,R:109,F:62,R:102,R:96,F:65,F:129,R:107,F:84,F:8,F:23,F:129,F:116,R:33,F:51,F:10,F:49,R:68,F:125,R:43,R:63,R:64,R:11,R:26,R:59,R:120,R:104,R:103,F:113,F:49,F:83,R:39,F:117,F:132,F:107,R:64,R:33,R:95,F:21,F:38,F:92,F:117,F:90,F:44,R:103,F:74,R:66,F:43,R:106,F:112,F:108,F:17,F:53,F:132,R:101,F:80,F:126,F:132,R:111,R:118,F:93,F:83,R:88,F:69,F:26,F:80,F:91,F:123,F:5,F:131,F:130,R:86,R:33,F:63,R:37,R:68,F:117,R:2,F:22,F:114,R:33,R:42,F:59,R:37,R:95,F:65,R:37,F:120,F:115,F:65,R:50,F:113,R:110,R:123,F:90,F:25,F:110,F:126,F:57,F:125,R:61,F:29,F:78,R:106,R:120,F:67,R:50,F:14,F:64,R:75,R:81,R:25,R:15,F:97,F:1,R:79,R:31,F:21,F:127,R:10,R:23,F:118,R:58,F:52,F:58,F:40,R:54,F:18,F:97,F:67,F:50,F:18,F:117,F:65,F:96,R:106,R:71,R:39,F:14,F:131,R:77,R:18,R:68,F:73,F:118,R:113,R:103,R:74,R:71,F:79,F:119,F:21,F:88,R:123,F:107,F:131,R:132,R:106,F:124,F:74,R:121,R:101,F:123,R:36,R:69,R:43,R:15,F:21,F:128,R:118,F:56,F:114,F:120,R:49,R:39,R:132,R:5,R:100,R:34,R:78,R:53,F:45,F:131,F:50,F:36,F:99,R:71,R:41,F:106,R:110,F:130,F:31,F:60,R:65,R:119,F:101,R:5,F:101,F:42,F:113,F:60,F:69,R:120,R:71,F:92,F:109,R:111,F:75,F:127,R:10,R:93,F:21,F:125,F:72,R:32,F:129,R:39,F:127,F:114,R:129,R:88,R:122,F:78,R:55,F:19,R:100,R:61,F:23,F:14,R:109,F:116,F:123,F:20,F:43,R:57,F:31,F:109,F:86,F:27,R:99,F:88,R:122,R:73,R:85,R:16,R:125,F:58,R:129,R:92,R:69,F:48,R:59,F:117,F:105,F:107,F:104,R:53,F:102,R:25,R:10,R:73,R:39,F:36,R:55,R:103,F:116,F:34,R:132,F:108,F:23,F:41,R:30,R:110,F:99,F:55,R:51,F:29,R:114,R:32,R:108,R:59,F:100,F:89,F:58,F:112,R:53,F:102,F:60,F:116,F:105,F:44,R:59,R:51,R:55,F:23,F:8,F:40,F:120,F:62,F:132,F:62,F:85,F:132,F:74,F:107,F:116,F:110,F:89,F:120,R:17,R:17,F:95,F:103,F:118,R:125,F:84,R:68,R:52,R:76,F:37,R:69,F:1,R:78,R:122,F:129,R:3,F:85,F:124,F:49,F:3,F:102,F:114,R:9,F:118,R:68,F:60,R:43,R:87,R:76,F:83,R:31,R:119,R:91,F:112,F:5,F:130,F:54,F:108,R:53,F:72,R:127,F:128,R:76,R:39,R:127,F:115,R:54,F:102,F:131,R:123,R:128,R:62,R:125,F:131,R:14,R:119,F:104,F:53,R:68,R:91,R:120,R:125,R:61,F:113,F:25,F:76,F:97,R:55,F:34,R:39,R:32,F:127,R:121,F:130,R:105,R:109,F:53,F:73,F:130,F:115,F:116,F:36,R:91,R:73,R:95,F:66,R:128,F:98,F:39,R:115,R:106,F:46,F:62,F:95,R:115,F:124,F:57,F:105,F:57,R:99,F:125,F:127,R:125,F:32,R:102,F:110,F:70,R:117,R:8,R:63,R:115,F:116,F:65,R:123,R:130,F:97,R:127,F:69,R:97,F:40,F:131,R:76,F:63,R:132,R:52,F:107,R:38,R:58,F:123,R:112,F:63,R:80,F:81,F:15,R:119,R:8,R:54,F:79,F:88,F:38,R:71,R:80,R:3,F:115,F:108,R:95,F:123,R:131,F:71,F:65,F:51,F:53,F:118,F:20,F:122,R:68,F:64,R:40,R:120,F:127,R:108,R:51,R:132,F:45,R:131,F:120,F:103,F:122,R:59,R:30,F:132,F:29,F:122,R:30,R:76,R:59,F:132,R:39,R:4,F:60,F:106,F:84,R:128,R:63,F:108,R:132,R:4,F:94,F:102,F:128,R:46,R:107,F:123,F:52,R:23,F:119,F:96,F:116,R:80,R:49,F:38,F:120,R:102,F:116,R:104,F:110,R:13,R:29,R:69,F:72,R:11,F:107,R:59,F:125,R:77,F:100,F:86,R:66,R:48,R:62,F:11,F:123,F:27,F:97,R:112,R:104,R:50,R:66,F:114,F:125,R:128,F:124,R:106,F:129,R:113,F:111,F:63,R:41,F:108,F:124,F:84,R:125,F:129,R:86,R:100,R:88,R:103,R:87,R:74,R:19,R:18,F:24,R:72,R:58,F:130,F:45,F:104,F:100,R:120,F:52,F:129,F:128,R:127,R:15,R:89,R:42,F:50,R:70,F:131,F:84,F:115,F:71,R:15,R:88,F:7,R:6,R:37,F:124,R:125,R:123,F:17,R:58,F:123,R:99,R:37,F:46,R:90,R:29,F:72,R:96,R:122,F:12,R:28,F:49,F:63,F:91,R:125,R:122,R:132,R:68,F:61,F:132,R:58,R:116,F:130,R:60,F:86,F:84,F:31,R:68,F:3,F:90,R:75,F:31,F:127,F:59,F:99,R:125,F:91,R:100,R:117,R:75,F:119,F:128,F:2,F:124,R:104,F:118,F:112,R:47,R:125,F:92,F:38,F:112,F:27,R:125,F:66,R:17,R:123,R:85,F:32,R:75,F:34,R:41,R:85,F:110,R:37,F:61,F:50,F:41,F:104,F:66,R:19,F:77,F:17,R:102,F:127,R:43,F:74,F:51,R:124,R:73,F:50,R:93,R:89,R:129,F:90,R:103,F:28,R:54,R:130,R:105,R:69,F:104,F:103,F:49,R:108,F:72,F:64,F:81,F:83,F:111,R:45,F:60,F:93,R:20,R:30,F:107,F:119,F:65,F:131,R:108,R:111,R:54,F:119,R:23,F:2,F:99,F:86,F:113,F:64,F:51,R:49,F:130,F:119,F:109,R:127,F:122,R:74,F:73,F:120,R:101,F:61,F:128,F:13,R:123,F:35,R:72,F:18,R:51,F:113,R:68,F:104,R:74,R:124,F:20,R:17,F:47,F:7,R:57,F:128,F:121,R:69,F:69,F:111,F:19,F:93,F:59,F:131,R:36,R:32,R:22,R:6,F:91,R:72,F:90,R:49,F:25,R:7,R:95,F:21,R:79,F:56,R:35,F:129,R:77,F:89,R:74,F:121,R:97,F:58,F:129,R:90,F:90,F:117,R:66,F:121,R:68,R:92,F:7,F:32,R:73,F:5,F:16,R:80,R:13,R:19,R:80,F:91,F:94,R:85,R:39,F:132,F:46,F:87,R:88,R:79,F:79,F:124,F:124,F:71,F:122,R:10,R:92,R:38,R:35,F:107,F:17,F:20,R:57,R:112,R:80,R:101,F:131,F:23,F:86,F:108,R:45,R:62,F:117,R:110,F:107,R:31,R:7,R:35,R:68,F:93,F:26,R:92,R:11,F:95,R:68,R:111,F:18,F:114,R:67,F:90,F:55,F:10,F:18,F:59,R:69,R:112,R:36,F:65,F:123,F:13,R:107,F:4,F:118,R:86,R:85,R:68,F:78,R:74,R:87,F:117,F:83,R:89,F:68,R:67,F:40,F:76,F:85,F:42,R:68,F:14,F:72,F:65,R:127,R:101,R:76,R:80,R:98,R:36,R:81,F:52,F:101,R:47,R:105,R:80,R:6,F:42,F:73,F:128,R:113,F:14,F:37,F:31,R:115,F:7,R:65,F:131,R:122,F:85,F:49,F:116,F:18,F:54,R:70,R:24,F:108,F:26,R:125,F:116,F:15,R:88,F:126,R:105,R:91,R:48,F:19,R:39,R:84,R:63,R:57,F:0,F:131,F:92,R:121,R:67,R:80,R:105,F:94,R:88,F:88,R:23,F:98,F:84,R:51,R:59,F:130,R:108,R:25,R:97,F:57,F:120,F:98,F:70,F:46,F:73,R:67,R:58,F:71,F:48,R:119,F:41,F:104,R:131,R:100,R:2,R:42,F:114,R:91,F:109,F:103,F:76,F:15,F:103,R:119,R:126,R:109,R:132,F:93,R:6,R:38,F:42,F:70,F:62,F:131,R:29,R:97,R:54,R:98,F:53,R:104,R:7,R:114,F:52,R:112,R:68,F:112,R:123,F:107,F:132,R:88,R:123,R:115,F:75,F:104,R:128,R:30,F:83,R:41,F:67,F:103,F:21,R:98,F:51,F:71,F:94,R:126,F:63,F:94,F:36,F:1,R:79,R:33,R:118,F:130,R:47,R:12,F:84,R:50,R:68,R:118,F:112,F:108,F:66,F:99,F:126,R:118,R:41,F:113,R:111,F:43,F:93,R:55,F:73,F:104,R:55,R:13,F:92,R:88,F:83,F:128,F:99,F:62,F:54,R:24,R:43,R:70,R:37,F:61,R:122,F:122,F:102,F:38,F:65,R:119,R:111,R:33,F:1,F:33,R:22,F:96,F:73,R:93,F:112,R:75,F:80,F:83,R:62,F:61,R:58,R:48,F:120,F:94,F:75,F:83,R:88,F:126,R:39,R:109,R:114,F:127,R:38,F:55,R:54,R:90,R:46,F:101,R:128,F:122,F:6,F:45,R:132,R:78,R:68,R:52,F:122,F:33,F:102,F:131,F:36,F:32,F:85,F:124,R:63,F:89,R:82,F:33,F:131,R:52,F:60,R:77,F:86,R:124,R:17,F:1,R:94,R:71,F:82,R:6,R:39,R:10,F:101,R:34,R:7,R:71,R:39,F:83,F:87,F:112,F:113,R:97,R:65,R:117,F:131,R:21,R:117,R:78,F:50,R:92,R:67,F:8,F:63,R:65,F:19,R:130,R:130,R:58,F:86,F:64,F:109,R:42,F:52,R:78,R:6,R:110,R:68,R:78,R:99,F:106,F:82,F:69,R:122,F:68,R:20,R:100,R:3,R:23,R:17,R:84,F:115,R:81,F:86,F:8,R:127,F:48,F:101,R:52,F:65,F:130,R:57,R:6,F:124,R:60,F:55,F:98,F:71,R:40,R:55,R:59,R:122,F:13,R:33,F:37,F:94,R:47,F:68,R:104,R:103,R:23,R:119,F:125,F:56,R:64,F:128,F:125,R:80,F:99,F:95,F:0,R:55,R:20,F:35,R:15,R:106,F:66,F:72,R:20,R:84,F:71,F:72,R:112,F:111,F:48,R:31,R:57,R:118,F:107,R:35,R:68,F:83,R:126,R:122,F:92,R:124,F:105,R:24,F:58,F:114,R:8,F:120,R:15,R:50,F:130,R:107,F:130,F:60,F:60,F:65,F:80,F:129,R:35,R:36,R:4,R:102,F:131,F:12,R:75,R:61,F:81,R:53,R:93,F:127,R:71,F:105,R:61,R:75,R:122,F:115,R:47,R:105,F:75,F:130,R:96,R:52,F:110,R:119,R:121,F:47,F:129,R:121,R:92,F:58,R:28,F:94,R:32,F:126,R:104,F:55,F:88,R:104,F:128,F:92,R:68,R:82,F:50,F:129,F:101,F:85,F:119,R:9,F:71,R:42,F:105,F:116,F:101,R:96,F:36,R:3,F:54,R:7,F:77,R:73,R:47,F:68,R:62,F:64,F:103,F:114,R:109,F:95,F:117,F:112,F:118,R:30,R:95,R:45,R:48,F:69,R:68,R:30,R:49,R:47,R:63,F:96,F:72,R:5,F:86,F:74,R:95,F:85,F:132,F:96,F:108,F:63,F:113,F:63,R:123,F:122,R:52,F:45,R:99,F:42,F:42,F:18,F:127,F:75,R:117,F:63,R:17,R:100,F:51,R:45,F:9,F:78,F:3,R:79,R:66,F:58,F:64,R:35,F:7,R:66,R:69,F:62,F:90,F:95,R:30,F:130,F:66,F:88,F:96,R:15,R:130,R:75,F:110,R:51,R:12,F:91,R:125,F:125,R:35,R:102,F:124,R:131,F:11,R:57,F:54,F:55,F:117,F:120,R:70,R:90,R:17,F:126,F:116,F:65,R:83,R:52,F:101,R:59,R:118,R:131,F:77,R:53,R:32,R:116,R:120,R:78,F:107,F:81,F:86,R:47,R:131,R:97,R:68,R:45,F:94,F:127,R:54,F:127,F:130,R:122,R:56,R:66,R:90,F:76,F:82,R:64,F:128,R:54,R:112,R:38,R:90,R:19,R:101,F:115,R:74,R:71,F:127,F:83,F:54,F:16,F:111,F:127,F:67,R:91,F:118,R:124,R:102,F:115,R:112,F:79,F:7,F:108,F:3,R:131,R:89,R:122,R:48,F:93,F:131,R:130,R:86,R:86,R:8,F:7,R:68,F:49,R:16,F:88,F:61,F:33,R:115,F:96,R:21,F:37,F:123,R:5,R:58,R:30,F:65,F:94,R:16,R:104,R:63,F:45,F:65,F:72,R:89,R:84,F:86,F:124,R:94,F:58,F:64,F:54,R:120,R:118,F:48,R:37,F:79,R:36,F:56,R:44,F:27,F:119,R:104,R:37,F:62,F:106,F:68,R:73,F:65,R:99,F:11,R:57,F:82,F:121,R:42,R:37,F:103,R:130,R:121,F:89,R:78,F:128,R:68,F:13,F:41,F:58,R:46,R:83,F:59,F:65,R:52,R:38,F:97,R:125,F:110,F:123,F:61,R:18,R:9,F:18,R:84,R:119,F:119,R:78,F:48,F:119,F:69,R:44,R:8,F:118,R:72,R:29,F:123,F:88,R:98,R:18,F:48,F:59,F:20,R:60,F:125,R:119,F:51,R:38,R:121,R:67,R:117,R:57,F:122,F:122,F:52,F:64,F:115,R:67,F:29,R:79,F:82,F:61,R:12,R:29,R:132,R:121,F:5,F:56,F:116,R:28,F:48,R:59,R:102,F:116,F:22,F:110,R:123,F:115,F:67,F:31,F:61,F:124,R:29,R:71,R:69,R:37,F:111,R:120,R:110,R:111,R:84,R:119,F:87,R:6,R:69,F:87,F:7,R:120,F:61,F:91,F:7,F:85,R:128,F:96,R:106,F:103,F:61,F:119,R:62,R:41,R:93,R:79,F:28,R:57,F:115,R:61,F:78,R:29,F:110,R:8,R:44,F:64,R:48,R:125,R:92,F:125,F:130,R:25,R:53,F:46,R:65,R:4,F:87,F:105,F:2,F:119,R:44,R:75,F:36,F:127,F:130,F:45,R:78,R:71,R:47,F:120,R:70,F:123,R:91,F:43,R:12,F:35,R:33,F:107,F:130,R:31,R:119,R:113,R:70,R:33,R:15,F:75,R:127,R:76,F:59,R:76,R:104,F:97,R:32,F:124,R:52,F:60,R:74,R:101,F:62,F:48,F:87,F:126,R:123,F:7,R:88,R:127,F:30,R:86,F:54,R:90,R:44,F:132,F:14,F:93,R:108,F:27,F:75,F:115,F:66,F:68,F:7,R:58,F:64,R:56,R:75,R:14,F:20,R:12,R:56,F:32,F:122,R:126,F:95,F:109,F:82,R:90,R:66,R:25,R:59,F:75,F:75,R:47,R:119,F:55,R:80,R:51,R:122,F:20,R:87,R:120,R:88,F:44,F:95,F:75,F:94,R:96,F:124,F:19,R:9,R:105,R:44,F:90,F:116,R:128,R:96,R:32,F:91,F:46,F:95,R:36,R:44,R:49,R:104,R:125,R:103,R:102,R:55,F:99,F:67,R:99,F:115,F:95,F:25,F:101,R:53,F:103,F:127,R:7,F:28,F:73,R:54,F:7,F:120,R:101,F:41,R:100,R:92,F:89,F:92,F:55,R:58,F:97,R:122,R:49,F:100,F:34,F:98,R:42,F:119,F:107,R:123,F:6,F:82,R:123,F:90,R:123,R:66,F:84,F:7,F:53,F:100,R:60,R:110,R:80,R:127,F:19,F:77,F:90,R:78,F:89,F:68,F:110,F:11,F:50,F:99,R:81,R:101,R:30,F:5,F:82,R:127,R:50,F:70,F:128,F:104,F:129,R:14,F:80,F:23,R:100,F:124,R:81,F:67,R:69,F:20,R:16,F:91,R:59,R:91,F:22,R:104,F:49,F:33,F:113,R:52,R:14,R:28,F:58,F:116,R:104,R:104,R:114,F:131,R:31,F:120,R:129,R:39,F:117,R:24,F:83,F:113,R:120,R:16,F:50,R:16,F:93,F:130,R:77,F:104,F:77,R:69,F:96,R:41,F:126,F:66,F:85,F:102,F:28,R:19,F:115,R:49,R:95,F:65,F:75,R:73,F:12,F:98,R:77,R:55,F:113,F:33,R:78,R:122,F:96,F:113,R:107,R:36,R:63,F:72,F:109,F:65,F:73,R:0,F:100,R:129,R:80,F:87,F:23,F:63,F:87,F:87,R:37,F:55,F:60,F:89,R:29,F:38,R:104,R:109,R:109,F:45,F:117,R:126,R:63,F:91,R:114,R:40,R:86,F:131,R:70,R:97,R:72,F:78,R:10,F:28,F:102,R:9,R:21,R:30,F:100,R:79,R:9,R:127,R:120,R:97,R:109,R:27,R:108,F:54,F:63,R:72,R:78,R:53,F:62,F:62,F:107,R:103,R:66,F:129,F:120,F:128,F:6,R:87,R:98,F:99,F:100,F:3,F:68,F:33,R:93,F:77,F:96,R:110,F:65,R:77,F:125,F:80,R:132,F:121,F:126,F:50,F:84,F:56,F:72,F:61,F:93,F:117,R:30,F:55,R:102,F:98,R:40,R:94,F:48,F:65,F:26,F:6,R:16,R:46,F:66,F:45,F:99,R:103,F:86,F:61,F:132,R:99,F:117,R:26,F:6,R:98,R:64,R:105,F:97,R:101,R:81,R:68,R:94,R:12,F:85,F:41,F:119,F:62,F:15,F:52,R:8,F:66,F:114,R:93,R:60,F:24,F:65,F:83,F:17,F:67,R:108,R:49,R:58,R:102,F:104,F:97,R:100,F:68,R:81,F:80,R:114,R:105,R:107,R:70,R:97,F:13,F:12,R:67,R:126,F:2,F:37,R:39,R:75,R:96,F:106,R:63,R:58,F:118,R:105,R:81,F:106,F:75,F:55,R:106,R:42,F:63,F:71,R:52,R:19,R:77,R:22,R:64,R:14,R:12,R:112,F:99,R:62,F:128,R:74,F:94,F:84,R:39,R:30,R:88,R:98,F:6,F:88,F:67,F:108,R:22,F:126,F:112,R:33,R:19,F:102,R:83,F:79,R:19,R:58,R:54,F:71,R:19,F:63,F:74,F:125,F:114,R:109,F:124,R:58,F:63,F:87,F:50,F:43,F:112,R:131,R:103,F:39,R:7,F:122,R:81,R:131,R:119,R:71,R:123,R:42,F:68,R:76,F:86,F:24,F:66,R:19,F:103,F:94,F:113,F:113,F:121,R:77,R:98,R:8,F:43,R:42,R:101,R:40,R:123,R:53,F:13,R:51,R:108,R:16,R:33,R:46,F:57,R:105,R:78,F:5,F:63,F:88,F:106,R:48,R:106,R:60,F:114,F:63,F:124,F:71,R:71,R:9,R:47,R:43,R:119,F:29,R:105,F:37,F:114,F:52,R:48,R:18,F:129,R:70,R:47,F:117,R:71,R:82,F:64,R:63,F:50,R:64,F:113,F:63,R:94,R:123,R:109,R:105,F:127,R:124,F:98,R:119,F:99,F:111,F:46,R:65,R:23,R:15,R:81,R:9,R:56,R:67,F:110,R:106,F:46,R:96,R:5,F:11,R:51,R:98,F:100,R:31,R:91,F:50,F:76,F:109,F:128,F:50,R:100,R:105,F:70,R:123,R:78,F:118,R:102,R:64,F:9,R:75,R:70,R:46,R:114,F:85,R:46,R:57,F:116,F:43,R:106,F:98,F:126,R:31,F:47,F:33,R:39,F:123,R:108,F:65,R:34,R:107,F:50,F:129,F:112,F:47,F:26,R:115,F:105,R:52,F:47,F:74,F:128,R:129,R:100,F:108,R:84,R:26,F:1,F:86,R:89,F:126,R:91,R:110,R:38,R:14,R:27,F:49,R:62,R:87,F:121,R:54,R:122,F:106,F:55,F:65,R:12,R:83,F:103,F:65,F:86,F:55,R:64,R:6,F:12,F:99,F:50,F:38,F:132,R:31,R:117,R:93,F:93,F:24,R:116,F:86,R:17,F:94,R:36,F:112,F:84,F:66,F:17,R:65,R:39,R:53,F:87,R:50,R:99,R:64,R:60,R:60,F:32,R:31,F:60,F:117,F:33,F:51,R:32,R:47,R:36,F:131,R:110,F:32,R:69,R:30,F:50,R:44,R:122,R:48,F:63,R:31,R:43,R:90,F:90,R:93,F:128,R:122,R:107,R:58,F:112,R:70,F:120,R:67,F:38,R:54,R:34,F:60,F:123,R:85,R:57,R:114,F:130,F:44,R:51,R:94,F:120,F:110,F:72,R:2,F:58,R:49,R:68,F:107,F:41,R:30,F:93,R:130,R:9,F:32,F:71,R:118,R:98,F:104,F:67,R:104,F:60,F:70,R:79,R:74,F:64,R:95,R:35,F:131,F:15,F:88,F:72,F:72,F:110,F:92,F:45,F:20,F:74,F:113,R:39,F:87,F:51,F:62,F:21,R:78,R:87,R:123,F:125,R:86,F:126,R:51,R:16,R:113,F:120,F:128,R:92,R:111,R:74,R:97,F:132,F:21,F:75,R:124,R:33,R:51,R:109,R:96,F:114,R:123,R:83,F:120,R:96,F:113,F:14,F:64,R:101,R:77,R:1,R:57,F:129,F:64,R:96,R:0,R:26,F:22,F:102,R:91,F:128,F:62,F:129,R:13,R:74,F:46,F:129,F:126,R:68,F:116,R:10,R:126,R:84,F:48,F:70,R:58,R:58,R:112,F:66,R:13,F:53,R:54,R:74,R:94,R:77,R:45,R:68,F:103,F:21,F:120,R:74,R:80,R:37,F:15,R:114,R:73,F:63,F:127,F:106,R:18,R:81,R:47,R:121,R:80,R:98,R:103,F:110,R:83,R:42,F:124,F:12,F:125,F:106,R:121,F:48,R:105,R:101,F:131,F:38,F:75,R:23,R:97,R:132,R:51,F:13,R:103,R:78,R:68,F:81,F:67,R:57,R:22,F:83,R:85,F:60,F:56,F:125,F:126,F:94,R:77,R:79,F:66,F:59,F:99,F:35,F:105,F:92,R:95,R:74,R:76,R:50,R:71,R:113,R:76,F:80,F:124,F:115,R:54,R:123,R:122,F:63,F:121,F:48,R:58,F:2,F:80,R:87,F:61,F:65,F:4,F:32,R:127,F:73,F:113,F:101,R:128,F:113,R:79,F:127,F:48,F:34,R:91,R:94,F:19,F:73,R:108,F:123,F:68,F:48,F:89,F:129,F:107,R:31,R:120,F:118,F:25,R:114,F:103,F:32,R:91,R:117,F:127,F:101,F:75,R:43,F:87,R:17,R:97,F:86,R:7,R:38,R:126,R:41,R:48,R:73,F:124,R:114,F:106,F:12,F:114,R:131,F:125,R:67,R:51,R:124,R:70,R:56,R:62,R:17,F:130,F:113,R:71,F:74,R:69,R:80,R:32,F:122,F:119,F:104,F:33,R:92,R:33,R:64,R:129,F:59,R:87,F:97,R:44,R:103,F:53,R:130,F:122,R:108,R:15,F:129,F:107,R:50,R:24,R:127,R:101,F:91,F:13,F:63,R:71,F:130,F:68,R:126,R:58,F:35,F:96,R:66,F:38,F:91,R:58,F:47,F:93,R:82,F:66,R:61,R:131,F:55,R:79,F:131,F:14,F:95,F:101,F:64,F:65,F:59,R:107,R:100,R:80,R:3,R:25,R:22,F:50,F:116,F:130,F:123,R:122,R:48,F:95,F:109,R:69,R:30,F:85,F:20,R:71,F:122,R:25,F:18,R:107,R:65,R:120,F:126,R:81,F:105,F:95,F:131,F:12,F:64,R:61,R:51,R:79,F:53,R:105,F:125,F:74,R:14,R:45,F:70,F:119,R:59,R:67,R:64,R:2,R:126,F:116,F:73,F:89,F:35,F:36,F:16,F:72,F:119,F:124,F:1,R:46,R:24,F:57,F:120,F:125,F:112,R:132,R:60,F:106,R:7,R:57,R:119,R:128,F:67,R:47,R:11,R:27,R:48,R:22,F:79,R:64,R:7,R:16,F:124,R:44,R:33,F:2,R:11,R:50,F:120,R:114,F:117,R:41,F:68,F:121,F:84,R:118,R:26,F:119,R:131,F:83,F:104,R:31,R:108,R:33,F:63,F:28,F:2,F:18,R:122,R:130,R:130,F:31,R:41,R:98,R:108,R:2,F:61,R:45,F:132,F:125,R:72,R:128,F:16,R:82,R:56,F:52,R:114,F:88,R:66,F:59,R:100,F:42,F:111,F:123,R:131,R:55,F:115,R:57,F:35,R:71,R:67,F:100,R:87,R:116,F:111,F:96,R:115,F:132,R:100,R:73,F:98,R:39,F:64,F:82,R:91,R:98,F:6,R:37,F:6,R:23,R:17,R:115,F:100,R:130,F:80,F:25,R:9,F:17,R:58,F:43,R:56,R:94,F:43,F:106,R:119,F:54,F:29,R:109,R:110,R:98,R:108,F:106,F:83,R:97,F:64,F:122,F:85,F:70,R:103,F:52,R:39,R:95,F:102,F:33,F:76,R:90,F:108,R:27,R:66,F:100,F:131,R:107,F:126,R:97,R:130,F:96,R:55,R:77,F:128,R:81,F:61,F:84,F:123,R:120,F:80,R:122,F:112,F:112,F:35,R:68,F:14,R:18,R:5,R:105,F:111,R:18,R:101,R:69,R:112,F:120,R:35,R:27,F:2,R:53,F:82,R:119,R:107,F:116,R:48,F:33,F:116,R:116,F:120,R:58,F:32,F:120,R:53,F:30,R:23,R:107,F:82,R:6,R:53,R:3,R:105,F:110,F:106,F:92,R:59,F:89,R:82,F:58,R:103,R:71,F:67,F:124,F:119,F:106,R:100,R:105,R:110,F:109,F:109,R:37,F:130,R:121,R:89,F:106,F:92,F:94,F:122,F:26,F:126,F:22,F:127,F:124,F:26,F:130,F:43,R:108,F:104,F:20,R:37,F:120,R:51,F:103,R:60,R:3,F:63,R:19,F:55,F:29,F:50,F:60,R:53,F:61,F:19,R:70,F:39,F:83,R:0,F:67,F:83,R:31,R:26,F:64,F:62,F:101,R:58,F:114,R:73,R:91,R:39,R:66,R:93,R:90,R:36,R:42,R:124,R:122,F:24,R:39,R:47,F:115,R:63,R:59,R:30,F:64,F:92,F:60,R:95,R:87,F:92,R:97,F:111,R:17,R:39,F:50,R:39,F:96,R:126,F:115,F:79,R:95,F:68,R:44,F:110,R:108,R:83,F:89,R:2,R:105,F:88,F:45,R:70,F:55,F:107,R:33,R:89,R:79,F:98,F:132,R:67,F:131,R:98,R:11,R:25,F:92,R:37,R:55,F:63,R:124,R:91,F:118,R:93,R:79,F:10,R:39,F:112,F:34,R:18,F:67,R:9,F:51,R:86,F:99,F:92,F:100,R:60,F:68,R:50,R:20,R:115,R:130,F:114,R:10,F:59,F:10,F:52,R:100,R:69,F:116,F:52,F:103,R:83,R:44,F:50,R:91,R:25,R:74,F:61,F:45,F:49,R:43,F:60,R:67,R:104,R:124,R:126,F:26,R:98,F:105,R:109,R:97,R:121,R:69,F:39,F:92,R:83,F:91,R:114,F:111,R:11,R:96,F:80,F:89,R:32,R:96,R:125,F:65,R:61,F:51,R:86,F:108,F:75,F:104,F:44,R:82,R:7,R:60,R:45,F:88,F:104,R:67,F:65,R:39,F:119,R:96,R:113,R:41,F:129,F:125,R:62,R:44,R:12,F:47,F:131,F:82,F:82,R:96,R:36,F:114,F:91,R:75,F:112,F:58,R:61,R:7,F:49,F:113,F:41,R:107,R:4,R:21,R:35,R:7,F:117,R:70,F:80,F:131,F:9,F:49,R:124,R:100,R:34,R:75,R:128,F:49,R:65,F:23,F:38,F:7,R:111,R:74,F:96,F:6,R:126,F:123,F:43,F:64,R:24,R:14,R:132,F:131,R:50,R:31,R:38,R:92,F:51,F:70,F:87,F:7,R:29,F:113,R:75,F:43,F:64,R:9,F:57,F:131,R:109,F:85,R:36,R:61,R:69,R:9,F:60,F:90,F:19,F:124,R:38,R:7,R:48,F:41,R:98,F:19,R:69,R:79,R:120,F:70,R:93,F:131,F:43,R:32,R:105,F:72,F:131,R:44,R:33,F:129,F:114,F:125,R:39,F:112,F:79,R:108,R:57,R:73,R:1,F:115,F:55,F:111,F:46,F:55,F:54,F:38,F:24,F:112,F:99,F:15,R:101,R:132,R:85,F:125,R:112,F:27,R:44,F:57,F:58,R:70,F:111,F:124,F:53,F:101,F:110,R:91,R:18,F:113,R:48,F:39,R:34,R:95,F:95,F:87,F:126,F:26,F:46,R:70,R:25,F:128,F:119,F:54,R:22,R:67,F:3,R:88,F:13,R:126,F:0,F:13,R:130,R:90,R:75,R:21,F:108,F:73,R:41,F:115,R:3,F:74,F:120,R:35,R:21,F:43,R:27,R:113,R:96,R:112,R:88,R:115,F:115,R:106,F:121,R:63,F:128,R:120,R:93,R:31,F:6,F:31,F:39,F:41,F:51,F:69,F:129,F:19,F:106,F:106,R:66,R:87,R:128,R:114,F:118,R:19,R:112,F:118,R:24,F:20,R:41,F:131,R:92,F:98,R:66,F:130,R:88,F:60,F:32,F:48,F:60,R:44,R:67,F:50,F:48,R:70,F:48,R:111,R:45,F:123,F:101,R:45,F:87,F:132,R:118,R:67,R:112,F:118,R:90,R:30,F:104,R:30,F:79,F:73,F:123,F:112,F:120,F:1,F:41,R:103,R:125,F:27,R:87,R:59,F:43,F:76,F:120,R:89,F:130,F:90,R:27,F:112,F:105,R:31,R:70,F:131,F:61,F:55,F:51,R:73,R:102,R:31,F:108,R:18,R:116,F:122,F:91,R:72,F:52,F:115,F:41,F:23,R:121,F:124,R:120,F:47,F:114,F:121,F:132,F:41,F:41,F:128,F:124,F:131,F:116,F:36,R:88,R:73,R:70,F:65,F:124,F:115,R:35,F:131,R:12,F:16,F:125,R:120,R:49,F:124,R:7,F:122,F:17,F:3,R:114,R:112,R:94
The logger records for each read the sequence that has been extracted along with the positions. One of the Round_1_BC barcodes is TTAGGCAT
which appears to start at position 116 in the below example. Note also that the 'fixed' linker 1 sequence (30 bp according to spec.yaml) is 17 bases on read 1 and 30 bases on read 2. Read 1 has a length of 140 bp, and so linker 1 being the last region extracted is truncated to the read length. If the linker 1 sequence is fixed, then surely it should be the same on both reads (though in reverse complement on read 2)? This further suggests that the wrong positions are being extracted.
2024-11-28 16:31:21 - scarecrow - INFO - VH00582:1:AAATJF3HV:1:1101:18269:1000: CTTAGACCAGGTTTAGTAAGAAAATACAAAAATCGAACAAACAAGAAACAGAAACAAAAACCAGAAGCAGAATATGACCACAGTCTCAAGCACGCCACAGTCTCAAGCACGTGGATTTAGGCATAGTCGTACGCCGATGC
2024-11-28 16:31:21 - scarecrow - INFO - Region cDNA:0-100: CTTAGACCAGGTTTAGTAAGAAAATACAAAAATCGAACAAACAAGAAACAGAAACAAAAACCAGAAGCAGAATATGACCACAGTCTCAAGCACGCCACAG
2024-11-28 16:31:21 - scarecrow - INFO - Region RT_primer:100-115: TCTCAAGCACGTGGA
2024-11-28 16:31:21 - scarecrow - INFO - Region Round_1_BC:115-123: TTTAGGCA
2024-11-28 16:31:21 - scarecrow - INFO - Region linker_1:123-140: TAGTCGTACGCCGATGC
2024-11-28 16:31:21 - scarecrow - INFO - VH00582:1:AAATJF3HV:1:1101:18269:1000: ATGTGTAGGGCATACCAAGAGTCCGATGTTTCGCATCGGCGTACGACTATGCCTAAATCCACGTGCTTGAGACTGTGGCGTGCTTG
2024-11-28 16:31:21 - scarecrow - INFO - Region UMI:0-10: GTTCGTGCGG
2024-11-28 16:31:21 - scarecrow - INFO - Region Round_3_BC:10-18: TGTCAGAG
2024-11-28 16:31:21 - scarecrow - INFO - Region Linker_2:18-48: TTCGTGCACCTAAATCCGTATCAGCATGCG
2024-11-28 16:31:21 - scarecrow - INFO - Region Round_2_BC:48-56: GCTACGCT
2024-11-28 16:31:21 - scarecrow - INFO - Region linker_1:56-86: TTGTAGCCTGAGAACCATACGGGATGTGTA
scarecrow extract ./scarecrow/specs/10xv2/10xv2.yaml \
/Users/s14dw4/Documents/Repos/scarecrow/specs/10xv2/R1.fastq \
/Users/s14dw4/Documents/Repos/scarecrow/specs/10xv2/R2.fastq \
-o ./cDNA.fq -r umi barcode
scarecrow extract ./scarecrow/specs/evercode/evercode-v3.yaml \
/Users/s14dw4/Documents/Repos/scarecrow/specs/evercode/R1.fastq \
/Users/s14dw4/Documents/Repos/scarecrow/specs/evercode/R2.fastq \
-o ./cDNA.fq -r BC1 BC2 BC3
Positions/sequences not correct:
2024-11-29 15:49:21 - scarecrow - INFO - ERR12167395.1: CAATTCTGACCGACTCTCGAGTCCGATGTCTCGCATCGGCGTACGACTCCTCTATCATCCACGTGCTTGAGACTGTGGGCTTAAAGTAGTGACTGTCGGC
2024-11-29 15:49:21 - scarecrow - INFO - Region cdna:0-64: CAATTCTGACCGACTCTCGAGTCCGATGTCTCGCATCGGCGTACGACTCCTCTATCATCCACGT
2024-11-29 15:49:21 - scarecrow - INFO - Region BC1:64-72: GCTTGAGA
2024-11-29 15:49:21 - scarecrow - INFO - Region L1:72-82: CTGTGGGCTT
2024-11-29 15:49:21 - scarecrow - INFO - Region BC2:82-90: AAAGTAGT
2024-11-29 15:49:21 - scarecrow - INFO - Region L2:90-100: GACTGTCGGC
2024-11-29 15:49:21 - scarecrow - INFO - ERR12167395.1: AAGCAGTGGTATCAACGCAGAGTGAATGGGACCGGCACCCTGGGCTCTGTATCCCTTCCTGTCTCTGAGCTCCTCCTGGTGGACCAGCTCTGCGTGGACC
2024-11-29 15:49:21 - scarecrow - INFO - Region polyN:0-10: CCAGGTGCGT
2024-11-29 15:49:21 - scarecrow - INFO - Region BC3:10-18: CTCGACCA
2024-11-29 15:49:21 - scarecrow - INFO - Region L2:18-28: GGTGGTCCTC
2024-11-29 15:49:21 - scarecrow - INFO - Region BC2:28-36: CTCGAGTC
2024-11-29 15:49:21 - scarecrow - INFO - Region L1:36-46: TCTGTCCTTC
2024-11-29 15:49:21 - scarecrow - INFO - Region BC1:46-54: CCTATGTC
2024-11-29 15:49:21 - scarecrow - INFO - Region cdna:54-100: TCGGGTCCCACGGCCAGGGTAAGTGAGACGCAACTATGGTGACGAA