ExpoSeq's Data Structure

In the beginning you will always start to process your data using Mixcr. You can do that locally by using the PlotManager or you can create a jobscript for a server by using mixcr_cl.py. For both ways the pipeline generates a file structure which looks like the following:

📦my_experiments
 ┣ 📂your_experiment_name
 ┃ ┣ 📂alignment_reports
 ┃ ┣ 📂assembly_reports
 ┃ ┣ 📂clones_result
 ┃ ┗ 📂tables_mixcr
  sequencing_report.csv

alignment_reports: Contains the alignment reports generated by mixcr with mixcr align.

assembly_reports: Contains the assembly reports generated by mixcr with mixcr assemble.

clones_result: Contains .clns files generated by mixcr. The pipeline supports in a basic way the creation mixcr plots. But you can customize them more by taking a look here.

tables_mixcr: These are the tables for all your samples in .tsv format. Mixcr generates these with mixcr exportClones. The pipeline uses these tables to create the sequencing_report.csv

If you have processed your data on a server you need to upload your data to the PlotManager. You need to do that by choosing the option 2 in the PlotManager and then you would choose 'your_experiment_name' in this case. You can also identify the target directory by the sequencing_report.csv which is located in the root of the directory of your experiment. Finally, the pipeline automatically creates a copy of that folder in the pipeline's working directory. After that, you can start to work with your data.

The Sequencing Report

The sequencing report contains the important information about your sequencing data. The corresponding file stored in the mentioned directory structure has not been trimmed. The pipeline uses this file to create an instance of it to be able to flexible trim the data on different inputs from the user. The sequencing report after trimming and preparation for the plots looks like the following:

	Experiment	cloneId	readCount	nSeqCDR3	aaSeqCDR3	lengthOfCDR3	cloneFraction
0	random1_R1_001	0	572	TGTAACGCAGTCCACTCTAGGTGGCAAGCTATGACCCGCTGG	CNAVHSRWQAMTRW	42	0.0530661
1	random1_R1_001	1	345	TGTATACAATCGGGAACGGATCGTAGG	CIQSGTDRR	27	0.0320067
2	random1_R1_001	2	320	TGTAATGTAGATTTAACGGTGGTGGACGGTAGGCACCTTCCACGGGGCGACTACTGG	CNVDLTVVDGRHLPRGDYW	57	0.0296874
3	random1_R1_001	3	308	TGTATACAATCGGGAACGAGTCGTAGG	CIQSGTSRR	27	0.0285741
4	random1_R1_001	4	297	TGTCATGCCGACCTAAGAGTACGCGACGGGGTAAGGGGTGACTACTGG	CHADLRVRDGVRGDYW	48	0.0275536
5	random1_R1_001	5	292	TGTAACGCAGTCCACTCTAGGTGGCAAGCTATGACCCACTGG	CNAVHSRWQAMTHW	42	0.0270897
6	...	...	...	...	...	...	...

Experiment: Contains the name of the sample for the given data

cloneId: Is the id of the clone. The clone with the highest clone Fraction has cloneId=1

readCount: Is the number of reads for the given clone

nSeqCDR3: Is the nucleotide sequence of the CDR3 region

aaSeqCDR3: Is the amino acid sequence of the CDR3 region

lengthOfCDR3: Is the length of the nucleotide sequence

cloneFraction: Is the fraction of the clone in the sample

This report contains all your samples, so if you open the report and scroll down you will see that your other experiments will follow in the Experiment column. You save the report as a variable and continue to work with it with:

trimmed_report = plot.sequencing_report

In contrast, the raw sequencing report contains more columns and more rows. For instace, if you sequenced multiple regions of the heavy chain, the corresponding data for these chains will be stored in the raw sequencing report. You can switch to these regions in the PlotManager by typing

plot.change_preferred_region()

For example if you took CDR1 region the PlotManager will pull the corresponding columns from the raw sequencing report and replace nSeqCDR3 and aaSeqCDR3 with the corresponding columns in the layout described above.

You can save the raw sequencing report as a variable with:

raw_sequencing_report = plot.Report.origin_seq_report

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utilize_all_data.md

utilize_all_data.md

ExpoSeq's Data Structure

The Sequencing Report

Files

utilize_all_data.md

Latest commit

History

utilize_all_data.md

File metadata and controls

ExpoSeq's Data Structure

The Sequencing Report