In the beginning you will always start to process your data using Mixcr. You can do that locally by using the PlotManager or you can create a jobscript for a server by using mixcr_cl.py. For both ways the pipeline generates a file structure which looks like the following:
📦my_experiments
┣ 📂your_experiment_name
┃ ┣ 📂alignment_reports
┃ ┣ 📂assembly_reports
┃ ┣ 📂clones_result
┃ ┗ 📂tables_mixcr
sequencing_report.csv
alignment_reports: Contains the alignment reports generated by mixcr with mixcr align.
assembly_reports: Contains the assembly reports generated by mixcr with mixcr assemble.
clones_result: Contains .clns files generated by mixcr. The pipeline supports in a basic way the creation mixcr plots. But you can customize them more by taking a look here.
tables_mixcr: These are the tables for all your samples in .tsv format. Mixcr generates these with mixcr exportClones. The pipeline uses these tables to create the sequencing_report.csv
If you have processed your data on a server you need to upload your data to the PlotManager. You need to do that by choosing the option 2 in the PlotManager and then you would choose 'your_experiment_name' in this case. You can also identify the target directory by the sequencing_report.csv which is located in the root of the directory of your experiment. Finally, the pipeline automatically creates a copy of that folder in the pipeline's working directory. After that, you can start to work with your data.
The sequencing report contains the important information about your sequencing data. The corresponding file stored in the mentioned directory structure has not been trimmed. The pipeline uses this file to create an instance of it to be able to flexible trim the data on different inputs from the user. The sequencing report after trimming and preparation for the plots looks like the following:
Experiment | cloneId | readCount | nSeqCDR3 | aaSeqCDR3 | lengthOfCDR3 | cloneFraction | |
---|---|---|---|---|---|---|---|
0 | random1_R1_001 | 0 | 572 | TGTAACGCAGTCCACTCTAGGTGGCAAGCTATGACCCGCTGG | CNAVHSRWQAMTRW | 42 | 0.0530661 |
1 | random1_R1_001 | 1 | 345 | TGTATACAATCGGGAACGGATCGTAGG | CIQSGTDRR | 27 | 0.0320067 |
2 | random1_R1_001 | 2 | 320 | TGTAATGTAGATTTAACGGTGGTGGACGGTAGGCACCTTCCACGGGGCGACTACTGG | CNVDLTVVDGRHLPRGDYW | 57 | 0.0296874 |
3 | random1_R1_001 | 3 | 308 | TGTATACAATCGGGAACGAGTCGTAGG | CIQSGTSRR | 27 | 0.0285741 |
4 | random1_R1_001 | 4 | 297 | TGTCATGCCGACCTAAGAGTACGCGACGGGGTAAGGGGTGACTACTGG | CHADLRVRDGVRGDYW | 48 | 0.0275536 |
5 | random1_R1_001 | 5 | 292 | TGTAACGCAGTCCACTCTAGGTGGCAAGCTATGACCCACTGG | CNAVHSRWQAMTHW | 42 | 0.0270897 |
6 | ... | ... | ... | ... | ... | ... | ... |
Experiment: Contains the name of the sample for the given data
cloneId: Is the id of the clone. The clone with the highest clone Fraction has cloneId=1
readCount: Is the number of reads for the given clone
nSeqCDR3: Is the nucleotide sequence of the CDR3 region
aaSeqCDR3: Is the amino acid sequence of the CDR3 region
lengthOfCDR3: Is the length of the nucleotide sequence
cloneFraction: Is the fraction of the clone in the sample
This report contains all your samples, so if you open the report and scroll down you will see that your other experiments will follow in the Experiment column. You save the report as a variable and continue to work with it with:
trimmed_report = plot.sequencing_report
In contrast, the raw sequencing report contains more columns and more rows. For instace, if you sequenced multiple regions of the heavy chain, the corresponding data for these chains will be stored in the raw sequencing report. You can switch to these regions in the PlotManager by typing
plot.change_preferred_region()
For example if you took CDR1 region the PlotManager will pull the corresponding columns from the raw sequencing report and replace nSeqCDR3 and aaSeqCDR3 with the corresponding columns in the layout described above.
You can save the raw sequencing report as a variable with:
raw_sequencing_report = plot.Report.origin_seq_report