Petunia axillaris X P. exserta hybrid necrosis

Code used in the study of Petunia axillaris X P. exserta hybrid necrosis, associated with publication.

Authors: Chaobin Li, Marta Binaghi, Vivien Pichon, Gina Cannarozzi, Loreta Brandao de Freitas, Mathieu Hanemian, Cris Kuhlemeier

Author of this page: Marta Binaghi

Genome sequence and annotation used in all the analyses: P. axillaris N version 4.03 (submitted to NCBI).

BSR-Seq

To identify the regions of the genome asssociated with the presence and abscence of the necrosis phenotype, we applied a BSR-Seq approach. This is similar to a BSA (bulk segregant analysis) but is performed on RNA-seq data. The approach is very similar to that used by Soyk et al. 2017 doi.org/10.1038/ng.3733.

We produced two F2 populations, coming from the crosses P. axillaris N X P. exserta and P. exserta X P. axillaris N. The plants from these populations were then phenotyped for the presence and absence of necrotic symptoms and were grouped accordingly. Note that in the article we only present the results from the cross P. axillaris N X P. exserta. The raw reads for the other cross are nonetheless available on SRA.

We performed RNA extraction (see article materials and methods for details), and pooled the necrotic individuals together in a sample, and the healthy individual in another sample.

Sequencing

Was performed by Novogene, with the Eukaryotic RNA-seq (polyA enriched library prep).

paired end reads
150 bp read length
250-300 bp insert cDNA library
Sanger Illumina 1.9 encoding

Raw reads

Are available on NCBI BioProject PRJNA708139, under SRA accessions:

SRR13907521: healthy individuals, P. exserta X P. axillaris N
SRR13907522: necrotic individuals, P. exserta X P. axillaris N
SRR13907523: healthy individuals, P. axillaris N X P. exserta
SRR13907524: necrotic individuals, P. axillaris N X P. exserta

Forward reads are numbered 1, reverse are numbered 2.

Read quality assessment performed with fastqc. Reads trimmed with trimmomatic parameters LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36. Script bsa01_quality_trim.sh.

Read numbers before and after cleaning listed in bsa_read_alignment_stats.csv.

Read alignment

Genome index with STAR: bsa02_STAR_index.sh.

Alignment with bsa03_STARmapping.sh.

--sjdbOverhang 149
--outFilterType BySJout
--outFilterMultimapNmax 20
--twopassMode Basic

Aligned reads number in bsa_read_alignment_stats.csv.

We then add the read group info in each bam file with bsa04_readGroup_index.sh.

And mark duplicated reads with bsa05_markdup.sh.

Variant calling

We called variants using GATK v4.0.4.0, using the haplotypeCaller and filtering out variants based on quality values because Petunia does not have a high quality variants database to use BQSR.

Genome dictionary created in a previous step with bsa04_readGroup_index.sh, same script to add read group info in the bam files.

We called variants with bsa06_SNPcalling.sh, with parameters

--dont-use-soft-clipped-bases
--pcr-indel-model NONE
--stand-call-conf 30
--native-pair-hmm-threads 4

We then plot the distribution of the quality values of the variants to know if the GATK suggested filters are fine. This is done with bsa07_SNPselect_qualityplot.sh which uses plot_vcfq_distribution.R to make the plots.

We then apply the filters for quality with bsa08_SNPfilter.sh. Filters:

-window 10 
-cluster 3 
--filter-expression "QD<2.0" 
--filter-name "QD" 
--filter-expression "FS>30.0" 
--filter-name "FS"

Then we select variants passing filters and only the biallelic SNPs. We then keep only variants where the read depth is at least 100 and then we thin the dataset to 100bp.

Number of variants in each dataset is listed in bsa_variant_numbers.csv.

The final set of variants is available in BSA_SNP_biallelic_gatkselected_minDP100_thin100.recode.vcf.

Bulk segregant analysis

Is performed in R, with script bsa09_analysis.R.

The plots shown in the manuscript are obtained with script bsa10_plots_manuscript.R.

Shallow whole genome sequencing for fine mapping

Low coverage sequencing performed to define the boundaries of the introgression region in the IL5 line (Hermann et al 2015, https://link.springer.com/article/10.1007/s00425-015-2251-2).

Sequencing

Library preparation and sequencing were performed by the Next Generation Sequencing platform of the University of Bern, with settings:

whole genome sequencing library
Paired end
150 bp long reads
Illumina HiSeq 3000

Raw reads

Have been uploaded to NCBI SRA under BioProject PRJNA705072.

SRR13809747: sequencing sample KMH1, recombinant line ID 1_A1 (Extended data table 3)
SRR13809746: sequencing sample KMH2, recombinant line ID 7_A1 (Extended data table 3)
SRR13809741: sequencing sample KMH3, recombinant line ID 18_E3 (Extended data table 3)
SRR13809740: sequencing sample KMH4, recombinant line ID 26_E1 (Extended data table 3)
SRR13809739: sequencing sample KMH5, recombinant line ID 1_E1 (Extended data table 3)
SRR13809738: sequencing sample KMH6, recombinant line ID 7_E1 (Extended data table 3)
SRR13809737: sequencing sample KMH7, recombinant line ID 24_E1 (Extended data table 3)
SRR13809736: sequencing sample KMH8, recombinant line ID 18_A1 (Extended data table 3)
SRR13809735: sequencing sample KMH9, recombinant line ID 10_A1 (Extended data table 3)
SRR13809734: sequencing sample KMH10, recombinant line ID 29_A1 (Extended data table 3)
SRR13809745: sequencing sample KMH11, recombinant line ID 26_A1 (Extended data table 3)
SRR13809744: sequencing sample KMH12, recombinant line ID 25_A1 (Extended data table 3)
SRR13809743: sequencing sample KMH13, recombinant line ID 39_A3 (Extended data table 3)
SRR13809742: sequencing sample KMH14, recombinant line ID 34_A1 (Extended data table 3)

Forward reads are numbered 1, reverse are numbered 2.

Reads renamed with il01_rename_raw_reads.sh.

Read quality assessment performed with fastqc. Reads trimmed with trimmomatic parameters LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:100. Script il02_fastqc_trim.sh. Summary of raw and trimmed reads obtained with fastq-stats, il03_stats_index_genome.sh.

Read numbers before and after cleaning listed in il_read_alignment_stats.csv.

Alignment

Genome index done in script il03_stats_index_genome.sh, alignment performed with BWA MEM, with default parameters in script il04_align.sh.

The proportion of reads aligned is shown in the table il_read_alignment_stats.csv.

Variant calling

Is performed with GATK 4.0.4.0, following GATK best practices for organisms without a database of high quality variants. Variants are called in ERC mode. Variant calling done with script il05_call_variants.sh, then the single samples are combined in a vcf file with il06_combineGvcfs.sh. The same script is also extracting the quality values to plot them and verify that the hard filters of GATK are suitable for the dataset. The plotting is done with an R script plot_vcfq_distribution.R. The plots are shown in il_snp_quality.pdf and il_indel_quality.pdf.

I then apply the hard filters to the variants in script il07_filter_variants.sh. In order to visualise more easily the boundaries of the introgression, we filter the resulting vcf file to keep only sites on chromosome 2, with a call rate of 100% (all samples have a called genotype at the site), and we extract the homozygous genotype positions.

RNAseq

To identify genes differentially expressed in the necrotic and healthy plants from the introgression lines we performed an RNAseq experiment. The samples used are plants constituting the progeny of a single selfed plant heterozygous for the introgression IL5 (see article materials and methods for details). In the progeny of this heterozygous plant we have some plants homozygous exserta, homozygous axillaris and heterozygous in the introgression. Three plants per genotype were selected, and one leaf per plant was collected. The tissue was collected from leaves of the homozygous axillaris IL when they just started displaying necrotic symptoms, and equivalent tissue was collected from the other genotypes.

The RNAseq data were used to perform a differential expression (DE) analysis.

Sequencing

Was performed at the Lausanne Genomics Technologies Facility.

Single end reads
125 bp long
Illumina HiSeq
TruSeq stranded RNA library prep

Raw reads

Have been uploaded to NCBI SRA under BioProject PRJNA705649.

SRR13861738: heterozygous introgression, replicate 3 (kmh9)
SRR13861739: heterozygous introgression, replicate 2 (kmh8)
SRR13861740: heterozygous introgression, replicate 1 (kmh7)
SRR13861741: homozygous exserta introgression, replicate 3 (kmh6)
SRR13861742: homozygous exserta introgression, replicate 2 (kmh5)
SRR13861743: homozygous exserta introgression, replicate 1 (kmh4)
SRR13861744: homozygous axillaris introgression, replicate 3 (kmh3)
SRR13861745: homozygous axillaris introgression, replicate 2 (kmh2)
SRR13861746: homozygous axillaris introgression, replicate 1 (kmh1)

Raw reads are renamed with script rs01_rename_raw_reads.sh.

Quality control and trimming in script rs02_fastqc_trim.sh. Parameters for trimming:

trimmomatic SE \
  -threads 4 \
  -phred33 \
  kmh${SLURM_ARRAY_TASK_ID}.fastq.gz \
  kmh${SLURM_ARRAY_TASK_ID}_trim.fastq.gz \
  LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 \
  ILLUMINACLIP:TruSeq3-SE.fa:2:30:3

The read numbers before and after trimming are reported in rs_read_alignment_stats.csv.

Alignment

Note that I build another STAR index for the genome because in the genomeGenerate step they require the read length and here it is 125 bp. So I use script rs03_STAR_genome_index.sh to redo the genome index. Parameters:

STAR --runMode genomeGenerate \
  --runThreadN 4 \
  --genomeDir ${scdir}/data/genomes/Pax_125bp \
  --genomeFastaFiles ${scdir}/data/genomes/Peax402INV.fasta \
  --sjdbGTFfile ${scdir}/data/genomes/peaxi162AQ_Peax402INV.cds.gff \
  --sjdbOverhang 124 \
  --sjdbGTFtagExonParentTranscript Parent \
  --genomeChrBinNbits 16 --limitGenomeGenerateRAM 24000000000

The alignment is then performed with rs04_STARmapping.sh. Aligned read numbers reported in rs_read_alignment_stats.csv. Alignment performed with parameters:

STAR --genomeDir data/genomes/Pax_125bp \
    --runThreadN 16 \
    --readFilesIn data/raw/rs_rawreads/kmh${SLURM_ARRAY_TASK_ID}_trim.fastq.gz \
    --outFilterType BySJout \
    --outFilterMultimapNmax 20 \
    --outSAMtype BAM SortedByCoordinate \
    --twopassMode Basic \
    --outFileNamePrefix data/raw/rs_aligned_reads/kmh${SLURM_ARRAY_TASK_ID}_trim_STAR_ \
    --readFilesCommand zcat \
    --genomeLoad NoSharedMemory

Read count

Is performed with subread featureCounts function, in script rs05_featureCounts.sh. Parameters are:

featureCounts -T 16 \
    -t gene \
    -s 2 \
    -g ID \
    -a ${scdir}/data/genomes/peaxi162AQ_Peax402INV.cds.gff \
    -o ${scdir}/data/raw/rs_counts/rs_counts.txt kmh1_trim_STAR_Aligned.sortedByCoord.out.bam kmh2_trim_STAR_Aligned.sortedByCoord.out.bam kmh3_trim_STAR_Aligned.sortedByCoord.out.bam kmh4_trim_STAR_Aligned.sortedByCoord.out.bam kmh5_trim_STAR_Aligned.sortedByCoord.out.bam kmh6_trim_STAR_Aligned.sortedByCoord.out.bam kmh7_trim_STAR_Aligned.sortedByCoord.out.bam kmh8_trim_STAR_Aligned.sortedByCoord.out.bam kmh9_trim_STAR_Aligned.sortedByCoord.out.bam

Differential expression analysis

The DE analysis is done with DESeq2 in the R markdown script rs06_DE_analysis.Rmd. The output is in rs06_DE_analysis.html.

Software versions

bwa/0.7.17
fastqc/0.11.7l
fastq-stats ea-utils/1.1.2
GenomeAnalysisTK/4.0.4.0
picard-tools/2.18.11 or 2.9.0, see scripts to know which version was used
R on the computing cluster: R/3.4.2
R on the local machine: R/3.3.3
samtools/1.8
SnpEff/4.3t
STAR/2.6.0c
subread/1.6.0
trimmomatic/0.36
vcftools/0.1.15

R libraries used

For the versions, see at the bottom of the R scripts in the code folder.

DESeq2 Love MI, Huber W, Anders S (2014). “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology, 15, 550. doi: 10.1186/s13059-014-0550-8.
dplyr Wickham H, François R, Henry L, Müller K (2022). dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org, https://github.com/tidyverse/dplyr
ggplot2 Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org
optparse https://cran.r-project.org/package=optparse
pheatmap https://cran.r-project.org/package=pheatmap
PoiClaClu https://cran.r-project.org/package=PoiClaClu
RColorBrewer https://cran.r-project.org/package=RColorBrewer
tidyr Wickham H, Girlich M (2022). tidyr: Tidy Messy Data. https://tidyr.tidyverse.org, https://github.com/tidyverse/tidyr
vcfUtils custom package, available here vcfUtils.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Petunia axillaris X P. exserta hybrid necrosis

BSR-Seq

Sequencing

Raw reads

Read alignment

Variant calling

Bulk segregant analysis

Shallow whole genome sequencing for fine mapping

Sequencing

Raw reads

Alignment

Variant calling

RNAseq

Sequencing

Raw reads

Alignment

Read count

Differential expression analysis

Software versions

R libraries used

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
code		code
data		data
README.md		README.md

Kuhlemeier-lab/Petunia_hybrid_necrosis

Folders and files

Latest commit

History

Repository files navigation

Petunia axillaris X P. exserta hybrid necrosis

BSR-Seq

Sequencing

Raw reads

Read alignment

Variant calling

Bulk segregant analysis

Shallow whole genome sequencing for fine mapping

Sequencing

Raw reads

Alignment

Variant calling

RNAseq

Sequencing

Raw reads

Alignment

Read count

Differential expression analysis

Software versions

R libraries used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages