Skip to content

Latest commit

 

History

History
87 lines (58 loc) · 4.93 KB

utilize_all_data.md

File metadata and controls

87 lines (58 loc) · 4.93 KB

ExpoSeq's Data Structure

In the beginning you will always start to process your data using Mixcr. You can do that locally by using the PlotManager or you can create a jobscript for a server by using mixcr_cl.py. For both ways the pipeline generates a file structure which looks like the following:

📦my_experiments
 ┣ 📂your_experiment_name
 ┃ ┣ 📂alignment_reports
 ┃ ┣ 📂assembly_reports
 ┃ ┣ 📂clones_result
 ┃ ┗ 📂tables_mixcr
  sequencing_report.csv

alignment_reports: Contains the alignment reports generated by mixcr with mixcr align.

assembly_reports: Contains the assembly reports generated by mixcr with mixcr assemble.

clones_result: Contains .clns files generated by mixcr. The pipeline supports in a basic way the creation mixcr plots. But you can customize them more by taking a look here.

tables_mixcr: These are the tables for all your samples in .tsv format. Mixcr generates these with mixcr exportClones. The pipeline uses these tables to create the sequencing_report.csv

If you have processed your data on a server you need to upload your data to the PlotManager. You need to do that by choosing the option 2 in the PlotManager and then you would choose 'your_experiment_name' in this case. You can also identify the target directory by the sequencing_report.csv which is located in the root of the directory of your experiment. Finally, the pipeline automatically creates a copy of that folder in the pipeline's working directory. After that, you can start to work with your data.

The Sequencing Report

The sequencing report contains the important information about your sequencing data. The corresponding file stored in the mentioned directory structure has not been trimmed. The pipeline uses this file to create an instance of it to be able to flexible trim the data on different inputs from the user. The sequencing report after trimming and preparation for the plots looks like the following:

Experiment cloneId readCount nSeqCDR3 aaSeqCDR3 lengthOfCDR3 cloneFraction
0 random1_R1_001 0 572 TGTAACGCAGTCCACTCTAGGTGGCAAGCTATGACCCGCTGG CNAVHSRWQAMTRW 42 0.0530661
1 random1_R1_001 1 345 TGTATACAATCGGGAACGGATCGTAGG CIQSGTDRR 27 0.0320067
2 random1_R1_001 2 320 TGTAATGTAGATTTAACGGTGGTGGACGGTAGGCACCTTCCACGGGGCGACTACTGG CNVDLTVVDGRHLPRGDYW 57 0.0296874
3 random1_R1_001 3 308 TGTATACAATCGGGAACGAGTCGTAGG CIQSGTSRR 27 0.0285741
4 random1_R1_001 4 297 TGTCATGCCGACCTAAGAGTACGCGACGGGGTAAGGGGTGACTACTGG CHADLRVRDGVRGDYW 48 0.0275536
5 random1_R1_001 5 292 TGTAACGCAGTCCACTCTAGGTGGCAAGCTATGACCCACTGG CNAVHSRWQAMTHW 42 0.0270897
6 ... ... ... ... ... ... ...

Experiment: Contains the name of the sample for the given data

cloneId: Is the id of the clone. The clone with the highest clone Fraction has cloneId=1

readCount: Is the number of reads for the given clone

nSeqCDR3: Is the nucleotide sequence of the CDR3 region

aaSeqCDR3: Is the amino acid sequence of the CDR3 region

lengthOfCDR3: Is the length of the nucleotide sequence

cloneFraction: Is the fraction of the clone in the sample

This report contains all your samples, so if you open the report and scroll down you will see that your other experiments will follow in the Experiment column. You save the report as a variable and continue to work with it with:

trimmed_report = plot.sequencing_report

In contrast, the raw sequencing report contains more columns and more rows. For instace, if you sequenced multiple regions of the heavy chain, the corresponding data for these chains will be stored in the raw sequencing report. You can switch to these regions in the PlotManager by typing

plot.change_preferred_region()

For example if you took CDR1 region the PlotManager will pull the corresponding columns from the raw sequencing report and replace nSeqCDR3 and aaSeqCDR3 with the corresponding columns in the layout described above.

You can save the raw sequencing report as a variable with:

raw_sequencing_report = plot.Report.origin_seq_report