Skip to content

ShawnGao911101/Rice_pan-NLRome

Repository files navigation

Rice pan-NLRome

This GitHub repository archived the code and data generated from the following project:

Gao, Shang et al. (forthcoming) Constructing rice pan-NLRome and identifying how it was reshaped to response changing pathogenic stress during domestication.

This repository is uploaded by Shang Gao at School of Life Sciences, Tsinghua University.

Visualization code

Codes for figure generation can be found in the following file:

/Figures_code_v8_1May2022.ipynb

R and packages for data manipulation and figure generation are:

R 3.6.3 tidyverse ggplot2 gapminder socviz ggsci gridExtra stringr scales ggrepel cowplot ggsignif UpSetR ggsankey ggmap sp maptools maps

Data for visualization

Unzip 06.MS_v8_final_data.zip first to release the data used for figure generation. The major information table for all NLR genes is:

/06.MS_v8_final_data/SummaryTable_1May2022.csv

This table contians following columns:

Column name Description
gene_ID The unique ID for every NLR genes in the rice pan-NLRome.
Assembly_ID The ID of every rice genome assembly.
Accession_ID The ID of every rice genotype.
Contig_ID The contig where the NLR gene located.
Position The start position of the NLR gene.
Status The status of the rice genotype, i.e., cultivar, landrace or wild.
Species The subspecies information of the genotype.
Clustered_Type Paired (H2H, "head-to-head"), clustered (Concat), or singleton (-) NLR genes. T2T means the NLR genes are in a "tail-to-tail" arrangement with another NLR gene.
Pair_partner_new The partner gene of a paired NLR gene.
raw_NLR_arch The raw NLR architecture, namely, domain configuration, of a NLR generated by Pfam_scan.pl.
New2_NLR_domain_info The domain information of final NLR architecture optimized with LRRpredictor.
New2_NLR_arch The final NLR architecture.
New2_NLR_class The final NLR class.
LRR_number The number of LRR motif.
with_lrr Whether a NLR gene comes with LRR domain.
HOG Ortholog group ID.
HOG_class The class of an OG.
HOG_size The number of sequences in an OG.
pangenome_section The pan-genome section that an HOG belongs to.
Status2 Domesticated or wild accession.
Species2 Another format of subspecies information of genotypes.

Most figures of this project are generated based on this table. Other involved data in figure generation are also in 06.MS_v8_final_data directory.

Result datasets

The following file includes all identified NLR (without filteration) from the 67 rice accessions:

/Result_dataset/rice_panNLRome_AA_Sequence_dataset.tar.gz

The following file includes are the aligned consensus sequences of NB-ARC domains for all 863 HOGs:

/Result_dataset/all_HOG.cons.outgroup_added.mafft.fa

The following file is the ML tree of the rice pan-NLRome generated by IQ-tree:

/Result_dataset/rice_panNLRome_phylogeny_IQ-tree.contree

Citation

Gao, Shang et al. (forthcoming) Constructing rice pan-NLRome and identifying how it was reshaped to response changing pathogenic stress during domestication.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published