Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable regions in the same part of the sequence #56

Open
sidwekhande opened this issue Dec 12, 2024 · 2 comments
Open

Variable regions in the same part of the sequence #56

sidwekhande opened this issue Dec 12, 2024 · 2 comments

Comments

@sidwekhande
Copy link
Contributor

We would like to define variable polyA and cDNA region lengths in the same part of the sequence. For example, a sequence space of 40bp is a mix of variable polyA and cDNA lengths (both 0-40bp).

A test seqspec that defines this use case is uploaded to syn64109806, along with the corresponding fastqs.

The index command using seqspec v0.3.0 currently returns

seqspec index -m rna -r SS-CoPA-1956_L8_SS-PKR-500_SS-V1-LEFT-HALF_R1.fastq.gz,SS-CoPA-1956_L8_SS-PKR-500_SS-V1-LEFT-HALF_R2.fastq.gz SHAREseqv2-SS_PKR_500-RNA_updated.yaml

SS-CoPA-1956_L8_SS-PKR-500_SS-V1-LEFT-HALF_R1.fastq.gz	cdna	cdna	0	50
SS-CoPA-1956_L8_SS-PKR-500_SS-V1-LEFT-HALF_R2.fastq.gz	umi	umi	0	10
SS-CoPA-1956_L8_SS-PKR-500_SS-V1-LEFT-HALF_R2.fastq.gz	cdna_variable	cdna	10	50
SS-CoPA-1956_L8_SS-PKR-500_SS-V1-LEFT-HALF_R2.fastq.gz	poly_A	poly_A	50	90
SS-CoPA-1956_L8_SS-PKR-500_SS-V1-LEFT-HALF_R2.fastq.gz	linker1	linker	90	105
SS-CoPA-1956_L8_SS-PKR-500_SS-V1-LEFT-HALF_R2.fastq.gz	rna Cell Barcode 1	barcode	105	113
SS-CoPA-1956_L8_SS-PKR-500_SS-V1-LEFT-HALF_R2.fastq.gz	linker2	linker	113	143
SS-CoPA-1956_L8_SS-PKR-500_SS-V1-LEFT-HALF_R2.fastq.gz	rna Cell Barcode 2	barcode	143	149

This concatenates the cdna_variable and polyA regions in order and assigns 40bp length to each, and "pushes" the rna Cell Barcode 3 out of the index command, as the defined total length of the sequence (also observed when passing -t kb)

Passing -t kb

seqspec index -m rna -r SS-CoPA-1956_L8_SS-PKR-500_SS-V1-LEFT-HALF_R1.fastq.gz,SS-CoPA-1956_L8_SS-PKR-500_SS-V1-LEFT-HALF_R2.fastq.gz -t kb SHAREseqv2-SS_PKR_500-RNA_updated.yaml

1,105,113,1,143,149:1,0,10:0,0,50,1,10,50

Please let me know your thoughts - thanks!

@sbooeshaghi
Copy link
Collaborator

Hi- thanks for bringing this up. The seqspec index command projects the relative positions of regions onto the "read" objects using the max length property (for each region). Therefore the rna cell barcode 3 is not listed as occurring in read 2 because the read is a fixed length and the regions being projected onto it are variable lengths where the sum of the max lengths is larger than the max length of the read. What would you expect to see as the output here?

@yuanlizhanshi
Copy link

Hi- thanks for bringing this up. The seqspec index command projects the relative positions of regions onto the "read" objects using the max length property (for each region). Therefore the rna cell barcode 3 is not listed as occurring in read 2 because the read is a fixed length and the regions being projected onto it are variable lengths where the sum of the max lengths is larger than the max length of the read. What would you expect to see as the output here?

I have a batch of self-constructed test data, where UMI and barcode information is located at fixed positions in R1, and R2 contains the sequencing data. I need to append the barcode and UMI information to R2, but in some cases, the library is quite short, and R2 can sequence through into R1. Thus to trim the polyA tail in R2 before alignment. Thus the ployA length is variable, how should I to create seqspec file
The custom library is below.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants