This GitHub repository archived the code and data generated from the following project:
Gao, Shang et al. (forthcoming) Constructing rice pan-NLRome and identifying how it was reshaped to response changing pathogenic stress during domestication.
This repository is uploaded by Shang Gao at School of Life Sciences, Tsinghua University.
Codes for figure generation can be found in the following file:
/Figures_code_v8_1May2022.ipynb
R and packages for data manipulation and figure generation are:
R 3.6.3 tidyverse ggplot2 gapminder socviz ggsci gridExtra stringr scales ggrepel cowplot ggsignif UpSetR ggsankey ggmap sp maptools maps
Unzip 06.MS_v8_final_data.zip
first to release the data used for figure generation.
The major information table for all NLR genes is:
/06.MS_v8_final_data/SummaryTable_1May2022.csv
This table contians following columns:
Column name | Description |
---|---|
gene_ID | The unique ID for every NLR genes in the rice pan-NLRome. |
Assembly_ID | The ID of every rice genome assembly. |
Accession_ID | The ID of every rice genotype. |
Contig_ID | The contig where the NLR gene located. |
Position | The start position of the NLR gene. |
Status | The status of the rice genotype, i.e., cultivar, landrace or wild. |
Species | The subspecies information of the genotype. |
Clustered_Type | Paired (H2H, "head-to-head"), clustered (Concat), or singleton (-) NLR genes. T2T means the NLR genes are in a "tail-to-tail" arrangement with another NLR gene. |
Pair_partner_new | The partner gene of a paired NLR gene. |
raw_NLR_arch | The raw NLR architecture, namely, domain configuration, of a NLR generated by Pfam_scan.pl. |
New2_NLR_domain_info | The domain information of final NLR architecture optimized with LRRpredictor. |
New2_NLR_arch | The final NLR architecture. |
New2_NLR_class | The final NLR class. |
LRR_number | The number of LRR motif. |
with_lrr | Whether a NLR gene comes with LRR domain. |
HOG | Ortholog group ID. |
HOG_class | The class of an OG. |
HOG_size | The number of sequences in an OG. |
pangenome_section | The pan-genome section that an HOG belongs to. |
Status2 | Domesticated or wild accession. |
Species2 | Another format of subspecies information of genotypes. |
Most figures of this project are generated based on this table.
Other involved data in figure generation are also in 06.MS_v8_final_data
directory.
The following file includes all identified NLR (without filteration) from the 67 rice accessions:
/Result_dataset/rice_panNLRome_AA_Sequence_dataset.tar.gz
The following file includes are the aligned consensus sequences of NB-ARC domains for all 863 HOGs:
/Result_dataset/all_HOG.cons.outgroup_added.mafft.fa
The following file is the ML tree of the rice pan-NLRome generated by IQ-tree:
/Result_dataset/rice_panNLRome_phylogeny_IQ-tree.contree
Gao, Shang et al. (forthcoming) Constructing rice pan-NLRome and identifying how it was reshaped to response changing pathogenic stress during domestication.