Add option to skip transcriptome quantification #17

dcroote · 2018-06-26T01:32:47Z

For those interested in only the assembled sequences, this option can
substantially reduce runtime by skipping the Kallisto build
transcriptome index step. Note Kallisto is still used so as to not break
anything downstream, but the index is built from only the
assembled sequences rather than entire transcriptome and
assembled sequences.

For bracer test adding --no_transcriptome_quant decreased the
runtime from 311s to 60s (80% decrease) on my machine.

mstubb · 2018-06-26T09:56:13Z

Thanks for this. Can you show that this gives equivalent quantification for the assembled sequences when compared with making an index from the whole transcriptome? It was always my concern that having so few sequences in the index would violate assumptions in the Kallisto/Salmon quantification models and so would give spurious quants.

For an alternative way to solve this same problem, have a look at the -small_index option for TraCeR (https://github.com/Teichlab/tracer#options). Perhaps we should try to bring that into BraCeR and unify some functionality. Interested to hear your thoughts.

PS - apologies for not accepting your CI pull request. I've been travelling and haven't had chance to look at it. Hoping to do so this week.

dcroote · 2018-06-26T17:30:57Z

@mstubb - the quantification will be different. My thought is that this option would be for those who are performing mapping / quantification by another method e.g. STAR + htseq and are not interested in the pseudoalignment transcriptome counts. We could explicitly write TPM: N/A or similar via the get_summary core function to make the output explicit in stating that the TPM will not be reliable if --no_transcriptome_quant is used

mstubb · 2018-06-26T18:31:49Z

That makes sense although worth noting that the quantifications are used to filter the reconstructed sequences in cases where more than two recombinants are detected for a particular locus (eg 3 Igh sequences). The two most highly expressed are assumed to be the 'correct' ones. If we implemented this we'd need to show that the ranking by expression of the recombinants was the same even if the TPMs were wildly different. Or, only allow this mode to be used without attempting any filtering.

dcroote · 2018-06-27T21:30:26Z

I agree that filtering is useful and agree that we would want to demonstrate that ignoring the transcriptome doesn't affect recombinant selection.

To verify this I've created a test where I've added the following to the fastq reads of cell1 within test_data:

349387 paired end reads from a non-B cell in order to include effects of the transcriptome
15000 paired end reads from results/cell2/aligned_reads/cell2_BCR_L_<1,2>.fastq in order to test for correct recombinant selection of the true lambda chain for cell1 (IGLV1-40) despite the presence of a low abundance cell1 kappa chain as well as this additional cell2 lambda chain: IGLV2-14.

The TPM results from filtered_BCR_seqs/filtered_BCRs.txt:

assembly	IGHV3-30	IGKV2-28	IGLV1-40	IGLV2-14
with transcriptome	126425.0	33325.5	134683.0	77786.6
without transcriptome	290882.0	116700.0	387686.0	204732.0

The raw TPM numbers are much higher without the transcriptome as expected, but when normalized by their TPM sum they are quite comparable:

assembly	IGHV3-30	IGKV2-28	IGLV1-40	IGLV2-14
with transcriptome	0.34	0.09	0.36	0.21
without transcriptome	0.29	0.12	0.39	0.2

For those interested in only the assembled sequences, this option can substantially reduce runtime by skipping the Kallisto build transcriptome index step.

Add option to skip transcriptome quantification

cfc9d93

For those interested in only the assembled sequences, this option can substantially reduce runtime by skipping the Kallisto build transcriptome index step.

dcroote force-pushed the no-transcriptome-quant branch from 7f0b020 to cfc9d93 Compare November 15, 2018 00:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to skip transcriptome quantification #17

Add option to skip transcriptome quantification #17

dcroote commented Jun 26, 2018

mstubb commented Jun 26, 2018

dcroote commented Jun 26, 2018

mstubb commented Jun 26, 2018

dcroote commented Jun 27, 2018

Add option to skip transcriptome quantification #17

Are you sure you want to change the base?

Add option to skip transcriptome quantification #17

Conversation

dcroote commented Jun 26, 2018

mstubb commented Jun 26, 2018

dcroote commented Jun 26, 2018

mstubb commented Jun 26, 2018

dcroote commented Jun 27, 2018