Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about sequin used #19

Open
Kyung-TaeLee opened this issue Jan 25, 2022 · 2 comments
Open

Questions about sequin used #19

Kyung-TaeLee opened this issue Jan 25, 2022 · 2 comments

Comments

@Kyung-TaeLee
Copy link

Hi, thank you for providing such a wonderful resources to the community. I am trying to analyze the data to compare performance of multiple quantification program. To do that, I downloaded cDNA PCR sequencing data (SQK-PCS109) and short-read data with sequin included. I first ran Anaquin (sequin analyzing softwares) to check the consistency of expression between expected and estimated (Kallisto is used by Anaquin). However, correlation was very poor (around 0.1). Then I realized that Anaquin software provides sequin 2.4 version but in the excel file provided in the original manuscript, it was stated that sequin version 1 was used. Do they differ in terms of transcripts used and their concentration? I tried to find the sequin version 1 reference file (decoy chromosome, gene annotation in GTF) but couldn't find any. I visited the sequinstandard web site and tried to access the resources in the webpage but can't ( I have to log in to access the files but they won't let me register. I don't know why). Could provide reference files for the sequin used in the study (also the file that contains expected concentration)? Thank you and have a nice day

@alexyfyf
Copy link

alexyfyf commented Aug 1, 2023

Would like to know as well. I found the reformatted gtf contains bambu generated transcripts with sequin, but not sure if these are complete sequin transcripts.

@cying111
Copy link
Collaborator

cying111 commented Nov 8, 2023

Hi,

Sorry for getting back lately.

For the sequin reference, you may download the fasta and gtf file using the links below:

gtf file: http://sg-nex-data.s3.amazonaws.com/data/annotations/gtf_file/hg38_sequins_SIRV_ERCCs_longSIRVs_v5_reformatted.gtf
this is the complete gtf file that we used for our analysis, for your case, you can subset the gtf file to only sequin annotations, filtering by either sequin gene or transcript names
fasta file:
http://sg-nex-data.s3.amazonaws.com/data/annotations/genome_fasta/hg38_sequins_SIRV_ERCCs_longSIRVs.fa
similarly for the fasta file, you can also extract only the sequences for sequins by looking at chrIS only

For the sequin concentration, we have recently added the concentration file that we have used for the original manuscript: https://github.com/GoekeLab/sg-nex-data/blob/master/docs/RNAsequins_MixA.xlsx

Let me know you still have issues related to this.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants