-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
212 lines (140 loc) · 14.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# smokingMouse
<!-- badges: start -->
[![GitHub issues](https://img.shields.io/github/issues/LieberInstitute/smokingMouse)](https://github.com/LieberInstitute/smokingMouse/issues)
[![GitHub pulls](https://img.shields.io/github/issues-pr/LieberInstitute/smokingMouse)](https://github.com/LieberInstitute/smokingMouse/pulls)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![Bioc release status](http://www.bioconductor.org/shields/build/release/data-experiment/smokingMouse.svg)](https://bioconductor.org/checkResults/release/data-experiment-LATEST/smokingMouse)
[![Bioc devel status](http://www.bioconductor.org/shields/build/devel/data-experiment/smokingMouse.svg)](https://bioconductor.org/checkResults/devel/data-experiment-LATEST/smokingMouse)
[![Bioc downloads rank](https://bioconductor.org/shields/downloads/release/smokingMouse.svg)](http://bioconductor.org/packages/stats/bioc/smokingMouse/)
[![Bioc support](https://bioconductor.org/shields/posts/smokingMouse.svg)](https://support.bioconductor.org/tag/smokingMouse)
[![Bioc last commit](https://bioconductor.org/shields/lastcommit/devel/data-experiment/smokingMouse.svg)](http://bioconductor.org/checkResults/devel/data-experiment-LATEST/smokingMouse/)
[![Bioc dependencies](https://bioconductor.org/shields/dependencies/release/smokingMouse.svg)](https://bioconductor.org/packages/release/bioc/html/smokingMouse.html#since)
[![R-CMD-check-bioc](https://github.com/LieberInstitute/smokingMouse/actions/workflows/check-bioc.yml/badge.svg)](https://github.com/LieberInstitute/smokingMouse/actions/workflows/check-bioc.yml)
<!-- badges: end -->
Welcome to the `smokingMouse` project! Here you'll be able to access the mouse expression data used for the analysis of the smoking-nicotine-mouse LIBD project.
## Overview
This bulk RNA-sequencing project consisted of a differential expression analysis (DEA) involving 4 data types: genes, transcripts, exons, and exon-exon junctions. The main goal of this study was to explore the effects of prenatal exposure to smoking and nicotine on the developing mouse brain. As secondary objectives, this work evaluated: 1) the affected genes by each exposure in the adult female brain in order to compare offspring and adult results, and 2) the effects of smoking on adult blood and brain to search for overlapping biomarkers in both tissues. Finally, DEGs identified in mouse were compared against previously published results in human (Semick et al., 2020 and Toikumo et al., 2023).
## Study design
<figure>
<img src="man/figures/Study_design.png" align="center" width="800px" />
<figcaption style="color: gray; line-height: 0.88; text-align: justify"><font size="-1.5"><b>Experimental design of the study</b>. <b>A)</b> 21 pregnant mice were split into two experiments: in the first one prenatal nicotine exposure (PNE) was modeled administering nicotine (n=3) or vehicle (n=3) to the dams during gestation, and in the second maternal smoking during pregnancy (MSDP) was modeled exposing dams to cigarette smoke during gestation (n=8) or using them as controls (n=7). A total of 137 pups were born: 19 were born to nicotine-administered mice, 23 to vehicle-administered mice, 46 to smoking-exposed mice, and 49 to smoking control mice. 17 nonpregnant adult females were also nicotine-administered (n=9) or vehicle-administered (n=8) to model adult nicotine exposure, and 9 additional nonpregnant dams were smoking-exposed (n=4) or controls (n=5) to model adult smoking. Frontal cortex samples of all P0 pups (n=137: 42 for PNE and 95 for MSDP) and adults (n=47: 23 for the nicotine experiment and 24 for the smoking experiment) were obtained, as well as blood samples from the smoking-exposed and smoking control adults (n=24), totaling 208 samples. Number of donors and samples are indicated in the figure. <b>B)</b> RNA was extracted from such samples and bulk RNA-seq experiments were performed, obtaining expression counts for genes, transcripts, exons, and exon-exon junctions.
</font>
</figcaption>
</figure>
## Workflow
The next table summarizes the analyses done at each level.
<figure>
<img src="man/figures/Table_of_Analyses.png" align="center" width="800px" />
<figcaption style="color: gray; line-height: 0.94; text-align: justify"><font size="-1.5"><b> Summary of analysis steps across gene expression feature levels</b>:
<b>1. Data processing</b>: counts of genes, exons, and exon-exon junctions were normalized to CPM and log2-transformed; transcript expression values were only log2-transformed since they were already in TPM. Lowly-expressed features were removed using the indicated functions and samples were separated by tissue and age in order to create subsets of the data for downstream analyses.
<b>2. Exploratory Data Analysis (EDA)</b>: QC metrics of the samples were examined and used to filter the poor quality ones. Sample level effects were explored through dimensionality reduction methods and segregated samples in PCA plots were removed from the datasets. Gene level effects were evaluated with analyses of variance partition.
<b>3. Differential Expression Analysis (DEA)</b>: with the relevant variables identified in the previous steps, the DEA was performed at the gene level for nicotine and smoking exposure in adult and pup brain samples, and for smoking exposure in adult blood samples; DEA at the rest of the levels was performed for both exposures in pup brain only. DE signals of the genes in the different conditions, ages, tissues, and species (using human results from $^1$[Semick et al., 2020](https://www.nature.com/articles/s41380-018-0223-1)) were contrasted, as well as the DE signals of exons and transcripts vs those of their genes. Mean expression of DEGs and non-DEGs genes with and without DE features was also analyzed. Then, all resultant DEGs and DE features (and their genes) were compared by direction of regulation (up or down) between and within exposures (nicotine/smoking); mouse DEGs were also compared against human genes associated with TUD from $^2$[Toikumo et al., 2023](https://www.medrxiv.org/content/10.1101/2023.03.27.23287713v2).
<b>4. Functional Enrichment Analysis</b>: GO & KEGG terms significantly enriched in the clusters of DEGs and genes of DE transcripts and exons were obtained.
<b>5. DGE visualization</b>: the log2-normalized expression of DEGs was represented in heat maps in order to distinguish the groups of up- and down-regulated genes.
<b>6. Novel junction gene annotation</b>: for uncharacterized DE junctions with no annotated gene, their nearest, preceding, and following genes were determined.
</font>
<font size="0.8">Abbreviations: Jxn: junction; Tx(s): transcript(s); CPM: counts per million; TPM: transcripts per million; TMM: Trimmed Mean of M-Values; TMMwsp: TMM with singleton pairing; QC: quality control; PC: principal component; DEA: differential expression analysis; DE: differential expression/differentially expressed; FC: fold-change; FDR: false discovery rate; DEGs: differentially expressed genes; TUD: tobacco use disorder; DGE: differential gene expression. </font>
</figcaption>
</figure>
All `R` scripts created to perform such analyses can be found in [code on GitHub](https://github.com/LieberInstitute/smoking-nicotine-mouse/).
## smoking Mouse datasets
The mouse datasets contain the following data in a single `R` `RangedSummarizedExperiment`* object for each feature (genes, transcripts, exons, and exon-exon junctions):
* **Raw data**: raw read counts (for genes, exons, and junctions) or TPM (for transcripts), also including the original metadata of the expression features and samples.
* **Processed data**: normalized and log-scaled counts of the same features (log(CPM+0.5) for genes, exons, and junctions, or log(TPM+0.5) for transcripts). In addition to the feature and sample information, the datasets contain information of which ones were used in downstream analyses (the ones that passed filtering steps), and which features were differentially expressed in the different experiments.
Moreover, you can find human data generated in Semick et al., (2018) in Mol Psychiatry (DOI: https://doi.org/10.1038/s41380-018-0223-1) that contain the results of a DEA for cigarette smoke exposure in adult and prenatal human brain.
*For more details, check the documentation for [`RangedSummarizedExperiment`](https://www.bioconductor.org/packages/devel/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html) objects.
## Data specifics
* *'rse_gene_mouse_RNAseq_nic-smo.Rdata'*: (`rse_gene` object) the gene RSE object contains the raw and log-normalized expression data of 55,401 mouse genes across the 208 samples from brain and blood of control and nicotine/smoking-exposed pup and adult mice.
* *'rse_tx_mouse_RNAseq_nic-smo.Rdata'*: (`rse_tx` object) the tx RSE object contains the raw and log-scaled expression data of 142,604 mouse transcripts across the 208 samples from brain and blood of control and nicotine/smoking-exposed pup and adult mice.
* *'rse_exon_mouse_RNAseq_nic-smo.Rdata'*: (`rse_exon` object) the exon RSE object contains the raw and log-normalized expression data of 447,670 mouse exons across the 208 samples from brain and blood of control and nicotine/smoking-exposed pup and adult mice.
* *'rse_jx_mouse_RNAseq_nic-smo.Rdata'*: (`rse_jx` object) the jx RSE object contains the raw and log-normalized expression data of 1,436,068 mouse exon-exon junctions across the 208 samples from brain and blood of control and nicotine/smoking-exposed pup and adult mice.
All the above datasets contain the sample and feature metadata and additional data of the results obtained in the filtering steps and the DEA.
* *'de_genes_prenatal_human_brain_smoking.Rdata'*: (`de_genes_prenatal_human_brain_smoking` object) data frame with DE statistics of 18,067 human genes for cigarette smoke exposure in prenatal human cortical tissue.
* *'de_genes_adult_human_brain_smoking.Rdata'*: (`de_genes_adult_human_brain_smoking` object) data frame with DE statistics of 18,067 human genes for cigarette smoke exposure in adult human cortical tissue.
## Installation instructions
Get the latest stable `R` release from [CRAN](http://cran.r-project.org/). Then install `smokingMouse` from [Bioconductor](http://bioconductor.org/) using the following code:
```{r 'install', eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("smokingMouse")
```
And the development version from [GitHub](https://github.com/LieberInstitute/smokingMouse) with:
```{r 'install_dev', eval = FALSE}
BiocManager::install("LieberInstitute/smokingMouse")
```
## Example of how to access the data
Below there's example code on how to access the mouse and human gene data but can do the same for any of the datasets previously described. The datasets are retrieved from [Bioconductor](http://bioconductor.org/) `ExperimentHub`.
```{r 'experiment_hub'}
## Connect to ExperimentHub
library(ExperimentHub)
eh <- ExperimentHub::ExperimentHub()
```
```{r 'access_data', message=FALSE, fig.height = 8, fig.width = 9}
## Load the datasets of the package
myfiles <- query(eh, "smokingMouse")
########################
# Mouse data
########################
## Download the mouse gene data
rse_gene <- myfiles[['EH8313']]
## This is a RangedSummarizedExperiment object
rse_gene
## Check sample info
colData(rse_gene)[1:5, 1:5]
## Check gene info
rowData(rse_gene)[1:5, 1:5]
## Access the original counts
original_counts <- assays(rse_gene)$counts
## Access the log-normalized counts
logcounts <- assays(rse_gene)$logcounts
########################
# Human data
########################
## Download the human gene data
de_genes_prenatal_human_brain_smoking <- myfiles[['EH8317']]
## This is a data frame
de_genes_prenatal_human_brain_smoking[1:5, ]
## Access data of human genes as normally do with data frames
```
## Citation
Below is the citation output from using `citation('smokingMouse')` in R. Please run this yourself to check for any updates on how to cite __smokingMouse__.
```{r 'citation', eval = requireNamespace('smokingMouse')}
print(citation('smokingMouse'), bibtex = TRUE)
#>
#>
#> To cite the original smoking-nicotine mouse work please use:
#>
#> (TODO)
#>
#>
#> To cite the original work from which human data come please use the following citation:
#>
#> Semick, S. A., Collado-Torres, L., Markunas, C. A., Shin, J. H., Deep-Soboslay, A., Tao, R., ...
#> & Jaffe, A. E. (2020). Developmental effects of maternal smoking during pregnancy on the human
#> frontal cortex transcriptome. Molecular psychiatry, 25(12), 3267-3277.
#>
```
Please note that the `smokingMouse` package and the [study analyses](https://github.com/LieberInstitute/smoking-nicotine-mouse/) were only made possible thanks to many other `R` and bioinformatics software authors, which are cited either in the vignette and/or the paper describing this study.
## Code of Conduct
Please note that the `smokingMouse` project is released with a [Contributor Code of Conduct](http://bioconductor.org/about/code-of-conduct/). By contributing to this project, you agree to abide by its terms.
## Development tools
* Continuous code testing is possible thanks to [GitHub actions](https://www.tidyverse.org/blog/2020/04/usethis-1-6-0/) through `r BiocStyle::CRANpkg('usethis')`, `r BiocStyle::CRANpkg('remotes')`, and `r BiocStyle::CRANpkg('rcmdcheck')` customized to use [Bioconductor's docker containers](https://www.bioconductor.org/help/docker/) and `r BiocStyle::Biocpkg('BiocCheck')`.
* Code coverage assessment is possible thanks to [codecov](https://codecov.io/gh) and `r BiocStyle::CRANpkg('covr')`.
* The [documentation website](http://LieberInstitute.github.io/smokingMouse) is automatically updated thanks to `r BiocStyle::CRANpkg('pkgdown')`.
* The code is styled automatically thanks to `r BiocStyle::CRANpkg('styler')`.
* The documentation is formatted thanks to `r BiocStyle::CRANpkg('devtools')` and `r BiocStyle::CRANpkg('roxygen2')`.
For more details, check the `dev` directory.
This package was developed using `r BiocStyle::Biocpkg('biocthis')`.