This repository contains all the code for reproducing the analysis from the gastrulation flow paper Mittnenzweig et al. (2021). The analysis is done with the metacell R package, that also contains the code for generating the network flow model.
-
Metacell paper: Baran et al. 2019 Genome Biology
-
Raw FASTQ files and processed UMI tables are available under GEO accession GSE169210
- metacell
- lpsymphony
- pheatmap
- gridExtra
- Matrix
- tidyverse
- shape
- umap
- qlcMatrix
- ggrepel
After cloning the github repository, users should open an R session in the repository root directory and download/initialize the scRNA database (~ 4.7 GB):
# Loading code and downloading required data files
source("scripts/download_data.r")
The repository root directory should now contain the subfolders scripts/ containing all the R scripts, scrna_db containing the metacell R objects, config/, data/ containing additional data generated by the scripts and figs/paper_figs/.
All figures of the paper can be regenerated by running:
# load all scripts
source("scripts/initialize_scripts.r")
generate_all_figures()
Please note that this will take some time. If you are interested in regenerating a specific figure, see the paragraph below.
For each figure (Figures 1-7 and S1-7), there is a corresponding script in scripts/generate_paper_figures/. Each script contains a function gen_fig_xyz_plots() at the top, that contains further subfunctions and explanations related to the analysis of that figure. E.g., for regenerating the plots of figure 1, users should run the following code:
# load metacell package
library("metacell")
# initializing the metacell scrna database
scdb_init("scrna_db")
# Generating plots of Figure 1
source("scripts/generate_paper_figures/fig_1.r")
gen_fig_1_plots()
The content of gen_fig_1_plots() looks as follwos:
gen_fig_1_plots = function() {
if(!dir.exists("figs/paper_figs")) {
dir.create("figs/paper_figs")
}
dir_name = "figs/paper_figs/fig1"
if(!dir.exists(dir_name)) {
dir.create(dir_name)
}
fig1_b()
fig1_cde()
fig1_f()
fig1_g_mc_time_distributions()
fig1_g_heatmap(plot_pdf = T)
fig1_h()
}
Figure plots are saved in figs/paper_figs/fig1/
Standard metacell analysis is performed as described in Baran et al. 2019. To recompute the metacell object, please run
source("scripts/generate_mc_mgraph_network/gen_mc.r")
This will generate a metacell object with id sing_emb_wt10_bs500f. Note, that because of random seeding of the boostrap procedure involved in calculating the metacell cover, the computed metacell cover will slightly deviate from sing_emb_wt10_recolored used in the paper. Manifold graphs and 2D projections can be recomputed through
source("scripts/generate_mc_mgraph_network/gen_mgraph.r")
source("scripts/generate_mc_mgraph_network/gen_mgraph_umap.r")
generate_mgraph_wt10()
gen_mc2d_umap_wt10()
The network flow model can be generated using
source("scripts/generate_mc_mgraph_network/gen_network.r")
build_sing_emb_wt10_network()
Metacells were clustered and annotated using the network flow model.
source("scripts/generate_mc_mgraph_network/annot_mc_by_flows.r")
cluster_metacells_by_flow(mct_id = "sing_emb_wt10",K = 65)
To regenerate the single-embryo timing data underlying Figure 1, please run
# load metacell package
library("metacell")
# initializing the metacell scrna database
scdb_init("scrna_db")
source("scripts/single_embryo_timing.r")
gen_fig_1_plots()
embryo_ranks = gen_single_embryo_timing()
# subfunctions calculating intrinsic_rank and reference_rank of each embryo
# are contained in gen_single_embryo_timing()
The output data frame embryo_ranks was added to the single-cell metadata information of the metacell matrix object. All subsequent functions using single-embryo time information, are extracting it from the cell_metadata entry of the WT metacell single-cell matrix object sing_emb_wt10.
# load metacell package
library("metacell")
# initializing the metacell scrna database
scdb_init("scrna_db")
mat = scdb_mat("sing_emb_wt10")
md = mat@cell_metadata
The parameter stability analysis of network flows underlying Figure S2A can be regenerated using
# regnerate data - this might take some time
source("scripts/parameter_stability_analysis.r")
gen_parameter_stability_analysis()
# replotting Figure S2A
source("scripts/generate_paper_figures/fig_s2.r")
fig_s2a()
To generate specific plots of Figures 6, S6 and S7, please run the corresponding functions from fig_6.r, fig_s6.r or fig_s7.r. Users interested in recomputing parts of the Foxc12 chimera and tetraploid embryo analysis (not needed for regenerating the plots), should run the following functions:
library("metacell")
scdb_init("scrna_db/")
source("scripts/foxc12/generate_chimera_tetraploid_data_analysis.r")
# Chimera embryos injected with Foxc12 DKO cells
foxc_chimera_generate_time_and_cell_type_annotation()
# Chimera embryos injected with control cells
control_chimera_generate_time_and_cell_type_annotation()
# Tetraploid embryos injected with Foxc12 DKO cells
foxc_tetraploid_generate_time_and_cell_type_annotation()
# Tetraploid embryos injected with control cells
control_tetraploid_generate_time_and_cell_type_annotation()
This will transfer cell-type and time annotation from the wt atlas to chimera/tetraploid embryos. Output is saved in data/chimera_tetraploid_analysis/. Scripts involved in preprocessing plates from the chimera and tetraploid embryo analyis are saved in the scripts/foxc12/preprocessing/. This includes
- Gating of single cells using the FACS GFP channel
- Removing cells from extraembryonic ectoderm and parietal endoderm
- Merging each single-cell matrix with the wt single-cell matrix and creating a joint single-cell graph (metacell cgraph object). See summary_preprocessing.r and the corresponding scripts for more details.
To generate all the figures of the paper using the docker image, please run the following commands:
docker pull tanaylab/embflow:latest
mkdir figs
docker run -ti --user $(id -u):$(id -g) -v $(pwd)/figs:/embflow/figs tanaylab/embflow:latest
And then run within the R session:
source("scripts/initialize_scripts.r")
generate_all_figures()
The figures would be then generated in the mounted directory "figs".
For help, please contact [email protected]