Skip to content
/ stmut Public

Visualizing Somatic Alterations of 10X Spatial Transcriptomics Data

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

limin321/stmut

Repository files navigation

stmut: Somatic Mutation Investigation of Spatial Transcriptomics Data


Characterizing gene expression profiles throughout tissue space provides key insights in investigating biological processes and disease development, including cancer. Bioinformatic tools exploring and interpreting spatial transcriptomics data are in great need - especially, approaches to visualize point mutations, allelic imbalance, and copy number variations (CNVs). CNVkit is a popular toolkit used to investigate the copy number alterations in both DNA-seq and RNA-seq data. Based on CNVkit-RNA and SAMtools, we provide an R package called stmut via this github page. The stmut package includes a series of functions to visualize copy number variations (CNVs), point mutations, and allelic imbalance in spatial transcriptomics data. We also provide the scripts producing the figures in the manuscript, which also serves as a user guide for this package. In addition, this package is also applicable to 10x single cell data analyses.

The functions in the stmut package are organized into 3 parts: CNVs, point mutations, and allelic imbalance.

This package was tested using R version 4.1.1, a macOS Monterey, Apple M1, 16G Memory. Given that spatial transcriptomics data normally have more than hundreds or thousands spots, we recommend using a high performance cluster to obtain point mutation and allelic imbalance for each spot.

Installation


You can install the development version of stmut from GitHub with:

# install.packages("devtools")
devtools::install_github("limin321/stmut")
library(stmut)

Notes

  • Bash scripts displayed in echo command are for your reference when you run your own data.
  • This package relies on previously sequenced DNAseq data, for example, exome data. That is, you need to have your bulk CNVs, germline SNPs, and somatic mutations list ready before using this package.
  • Prepare the following 5 files from the spaceranger pipeline output:
  1. filtered_feature_bc.csv
  2. Graph-Based.csv, this file is exported from 10X Loupe Browser as shown below.
  3. possorted_genome_bam.bam
  4. spatial/tissue_positions_list.csv
  5. raw_feature_bc_matrix/barcodes.tsv.gz

I. Point Mutation Detection


  • spotIndex generation: you can also run splitSpot() to generate an individual spot barcode and gene expression file, and each file is named numerically. For example, the first spot is spot000.txt, the next is spot001.txt and so forth.
file <- read.csv("./Rep1/Data/SpacerangerOutput/CloupeFilesManualAlignment/filtered_feature_bc.csv")
splitSpot(file = file)

output of splitSpot(). spotIndex contains individual spot barcode txt file; txt directory contains individual spot gene expression profile.

echo "subset-bam_linux --bam possorted_genome_bam.bam --cell-barcodes spot000.txt --out-bam spot000.bam"
echo "samtools index spot000.bam"
#> subset-bam_linux --bam possorted_genome_bam.bam --cell-barcodes spot000.txt --out-bam spot000.bam
#> samtools index spot000.bam
  • Count point mutations for each spot: we count the number of ref and mut reads using Mpileup_RNA.pl script found here. This scripts takes 3 inputs as shown in the following example. The first is the somatic mutation list; the second is the spot bam file; the third is the reference fasta file, which should be the same used either SpaceRanger or CellRanger. Make sure samtools is installed before running:
echo "perl Mpileup_RNA.pl Patient4SomaticSNPs.txt spot000/spot000.bam ./refdata-gex-GRCh38-2020-A/fasta/genome.fa"
#> perl Mpileup_RNA.pl Patient4SomaticSNPs.txt spot000/spot000.bam ./refdata-gex-GRCh38-2020-A/fasta/genome.fa
  • spaPointMutation creates a folder in your working directory including 8 files related to spot point mutations exploration. The AllSptTumPropsed.csv file contains a list of point mutations for visualization on the 10X Loupe Browser. The color scheme can be customized in the 10X Loupe Browser. The figures generated should be similar to Figure 1 in our manuscript. Make sure the format of your input files matches the examples provided by the package to ensure the smooth running of the codes.

II. Copy Number Variation Detection


To call copy number variation from 10X spatial or single cell data. We published 2 docker images, one built under ubuntu 20.04 (amd64), the other in MacOS A pple M1 chip (arm64).

Usage: /usr/local/bin/stmutcnv.sh cnv \
    --filteredFeatureCSV <value> \
    --clusterCSV <value> \
    --positionCSV <value> \
    --TotalReads <value> \
    --numSpots <value> \
    --group <value> \
    --annotate <value> \
    --arms <value> \
    --gainLoss <value> -\
    --pmtimes <value> \
    --clean <value>

Details of each argument

Argument [default] Description
filteredFeatureCSV “filtered_feature_bc.csv” file
clusterCSV “Graph-Based.csv” file
positionCSV “tissue_positions_list.csv” file
numSpots [8] [opt]Number of spots used for grouping
TotalReads[1000] [opt]Number of reads or genes of a new spot after grouping
group [gene] [opt] One of ‘gene’,‘read’, ‘none’
annotate A two-column normal, tumor annotated csv file
arms [opt] A list of arms; ex: 3p. 3q, 6p, 6q
gainLoss [opt] A list of 1, -1; ex: 1,-1,1,1
pmtimes [100] [opt] Permutation time when bulk DNA provided
clean [true] [opt] clean intermediate files

opt = optional

Pick your case type and refer to the example code to prepare yours.
case group by bulk tumor DNA-CNV data Example
1 gene YES (1)
2 gene NO (2)
3 read YES (1)
4 read NO (2)
5 none YES (3)
6 none NO (4)

Ready to run your data.

Step 1, download Docker or Singularity image.

Please pull the docker image from Docker Hub here
Ubutun:
docker pull limin321/stmutcnv_amd64:0.0.1
singularity pull stmut.sig docker://limin321/stmutcnv_amd64:0.0.1

Mac:
docker pull limin321/stmutcnv_arm64:0.0.1
singularity pull stmut.sig docker://limin321/stmutcnv_arm64:0.0.1

Step 2, prepare the following 4 input files, better in the same working dir:

    1) filtered_feature_bc.csv,
    2) Graph-Based.csv,
    3) tissue_positions_list.csv,
    4) annotate.csv. It should look like below

cluster annotate
Cluster1 tumor
Cluster2 normal
Cluster3 tumor
Cluster4 normal

The first 1) and third 3) inputs are standard Spaceranger outputs. The second 2) is exported from the Loupe browser as shown above. The fourth 4) input is a two column csv file annotated by you for each cluster. Please note the annoation “normal” and “tumor” should be little case.

step 3: Infer CNV from 10X Spatial Data – 10X Platform

Example 1, group by ‘gene’ or ‘read’ with bulk tumor DNA data CNV available. Assuming the four inputs file are in the ‘inputs’ folder inside ‘your_local_dir’

docker run --rm -v <your_local_dir>:/home/stmut stmutcnv_arm64:0.0.1 bash /usr/local/bin/stmutcnv.sh cnv \
    --filteredFeatureCSV ./inputs/filtered_feature_bc.csv \
    --clusterCSV ./inputs/Graph-Based.csv \
    --positionCSV ./inputs/tissue_positions_list.csv \
    --TotalReads 1000 \
    --numSpots 8 \
    --group gene \ # change to 'read' if you want to group by read counts.
    --annotate ./inputs/annotate.csv \
    --arms 3p,6q,9q \
    --gainLoss -1,-1,-1 \
    --pmtimes 20

Expect output:

analysis
└── grouped_spots
    ├── BarcodeLegend.csv
    ├── cdt
    │   ├── CNVs_OrganizedByGEcluster_UMIcount.cdt
    │   ├── CNVs_OrganizedByGEcluster_UMIcount.pdf
    │   ├── CNVs_RankedBySimilarityToDNA.cdt
    │   ├── CNVs_RankedBySimilarityToDNA_CNVscoreHistogram.csv
    │   ├── CNVs_RankedBySimilarityToDNA_CNVscoreHistogram.pdf
    │   ├── CNVs_RankedBySimilarityToDNA_QQplot.pdf
    │   ├── CNVs_RankedbySimilaritytoDNA_Quintiles4Loupe.csv
    │   ├── CNVs_clustered.Rdata
    │   ├── CNVs_clustered_heatmap.pdf
    │   └── permutCNV_summ.csv
    └── histogram_genes_per_spot.png

3 directories, 12 files

We provide three sets of outputs:
CNVs_OrganizedByGEcluster_UMIcount: the three files are used to generate Fig.4 included in our paper.
CNVs_RankedBySimilarityToDNA: this set is optional. Only when you provide bulk tumor CNVs data, these outputs will be generated.
CNVs_clustered: we provide a dendrogram of CNVs info. The details are saved in the .Rdata which you can extract by running the following R codes.

load("./analysis_readgp/grouped_spots/cdt/CNVs_clustered.Rdata") # this will load the htp obj in R
df1 <- read.table("./analysis/grouped_spots/cdt/CNVs_RankedBySimilarityToDNA.cdt", header = TRUE)
data <- htp$carpet
data1 <- cbind(df1[,1:2], data)
write.table(data1, file = "./analysis/grouped_spots/cdt/CNVs_cluster.cdt", sep = "\t", row.names = FALSE)

Example 2, group by ‘gene’ or ‘read’ without bulk tumor DNA data. Assuming the four inputs file are in the ‘inputs’ folder inside ‘your_local_dir’

docker run --rm -v <your_local_dir>:/home/stmut stmutcnv_arm64:0.0.1 bash /usr/local/bin/stmutcnv.sh cnv \
    --filteredFeatureCSV ./inputs/filtered_feature_bc.csv \
    --clusterCSV ./inputs/Graph-Based.csv \
    --positionCSV ./inputs/tissue_positions_list.csv \
    --TotalReads 1000 \
    --numSpots 8 \
    --group read \ # change to 'gene' if you want to group by gene counts.
    --annotate ./inputs/annotate.csv \

Expected outputs:

analysis_grp_read_NObulk
└── grouped_spots
    ├── BarcodeLegend.csv
    ├── cdt
    │   ├── CNVs_OrganizedByGEcluster_UMIcount.cdt
    │   ├── CNVs_OrganizedByGEcluster_UMIcount.pdf
    │   ├── CNVs_clustered.Rdata
    │   └── CNVs_clustered_heatmap.pdf
    └── histogram_genes_per_spot.png

3 directories, 6 files

Example 3: no grouping spots is performed. Bulk tumor CNV data is provided. Useful for single-cell data.

docker run --rm -v <your_local_dir>:/home/stmut stmutcnv_arm64:0.0.1 bash /usr/local/bin/stmutcnv.sh cnv \
    --filteredFeatureCSV ./inputs/filtered_feature_bc.csv \
    --clusterCSV ./inputs/Graph-Based.csv \
    --positionCSV ./inputs/tissue_positions_list.csv \
    --TotalReads 1000 \
    --numSpots 8 \
    --group none \
    --annotate ./inputs/annotate.csv \
    --arms 3p,6q,9q \
    --gainLoss -1,-1,-1 \

Expect outputs are the same as Example 1.

Example 4: no grouping spots is performed. No Bulk tumor CNV data is provided.

docker run --rm -v <your_local_dir>:/home/stmut stmutcnv_arm64:0.0.1 bash /usr/local/bin/stmutcnv.sh cnv \
    --filteredFeatureCSV ./inputs/filtered_feature_bc.csv \
    --clusterCSV ./inputs/Graph-Based.csv \
    --positionCSV ./inputs/tissue_positions_list.csv \
    --TotalReads 1000 \
    --numSpots 8 \
    --group none \
    --annotate ./inputs/annotate.csv \

Expected outputs are the same as Example 2.

One example of running in singularity:

singularity exec --bind <your_dir_to_mount>:/home/stmut --pwd /home/stmut <path/to>/stmut.sig bash stmutcnv.sh cnv \
    --filteredFeatureCSV ./inputs/filtered_feature_bc.csv \
    --clusterCSV ./inputs/Graph-Based.csv \
    --positionCSV ./inputs/tissue_positions_list.csv \
    --TotalReads 1000 --numSpots 8 \
    --group gene \
    --annotate ./inputs/annotate.csv

StereoSeq Platform

If your data is from StereoSeq, you need to do some extra work. Here is the step by step instructions.

First, run the following code to generate input files similar to 10X platform.
docker run --rm  -v ./stmutCNVtest/scripts/:/home/stmut/ stmutcnv:latest bash /usr/local/bin/stmutcnv.sh gemconvert \
    --gemfile ./stereo/<chipID>.tissue.gem.gz \ # the output from stereo SAW pipeline.
    --binsize 200 \ # the bin_size, bin200 is assumed to similar size as 10X visium spot size.
    --outpath ./stereo/ # path to save the outputs.

It takes 3 arguments: the tissue.gem.gz; the bin_size, the output dir you want to store the output.

Output 4 files.
    1) filtered_feature_bc.csv
    2) graph_based.csv
    3) tissue_positions_list.csv
    4) bin200_seurat.RDS
The counterfeit barcodes in each file were created to mimic the ones from 10X platform to keep consistent data format for analysis. The real corresponding coordinates are stored in the meta data of the rds. With this rds, you also need to perform clustering analysis, and annotate each cluster so as to create a annotate.csv file to run CNV analysis in the next step.

Second, infer CNV by following the same codes shown in the 10X Platform.

III. Allelic Imbalance


accumStartPos() and bulkLOHplot() functions are for generating bulk DNAseq allelic imbalance plots.

  • Generate ‘samtools mpileup’ input of counting major- and minor- reads per mutant of each spot.
# Tumor SNPs list
data1 <- read.table(file = "/Volumes/Bastian/Limin/Ji_data/Patient6/BulkDNASeq/LOH/MpileupOutput_TumorConverted.txt", sep = "\t",quote = "", header = TRUE)

# generate "samtools mpileup" input for counting major and minor alleles per mutant of each Spot
lohMpileupInput(data1 = data1) # the LOHmpileupInput.txt file will generate in your working dir

In our cases, the patient4_hg38_SNPs.txt and patient6_hg38_SNPs.txt files, which can be found here, are used to count the # of major and minor alleles of each spot in patient4 and patient6.

  • Counting the # of majorAllele- and minorAllele- reads per mutant of each spot. The script Mpileup_RNA_alleImbalance.pl can be downloaded here
echo "perl ./Mpileup_RNA_alleImbalance.pl ./LOHmpileupInput.txt spot000/spot000.bam"

#> perl ./Mpileup_RNA_alleImbalance.pl ./LOHmpileupInput.txt spot000/spot000.bam
  • Generate a summary table of all spot major/minor allele counts of all spots.
files <- c("/Volumes/Bastian/Limin/Ji_data/Patient6/SpatialTranscriptomic/Rep1/LOH/allelicImbalance2/mpileupOutput/spot0001/MpileupOutput_RNA.txt","/Volumes/Bastian/Limin/Ji_data/Patient6/SpatialTranscriptomic/Rep1/LOH/allelicImbalance2/mpileupOutput/spot0002/MpileupOutput_RNA.txt")

x <- files[1]
y = match("spot0001",str_split_fixed(x,"/",15)) # 12

lohMajorAlleleCt(files = files, y=12)

The output is 2 csv files: SNPallMajorAlleleCount.csv and SNPMajorAlleleCount.csv. The latter is used to generate Figures in the manuscript.

  • Scripts generating the allelic imbalance figures(Figure 4 and Figure S6) in the manuscript can be found here
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.29   lifecycle_1.0.3 magrittr_2.0.3  evaluate_0.16  
#>  [5] rlang_1.1.1     stringi_1.7.8   cli_3.4.1       rstudioapi_0.14
#>  [9] vctrs_0.6.2     rmarkdown_2.16  tools_4.1.1     stringr_1.5.0  
#> [13] glue_1.6.2      xfun_0.39       yaml_2.3.5      fastmap_1.1.0  
#> [17] compiler_4.1.1  htmltools_0.5.3 knitr_1.40

About

Visualizing Somatic Alterations of 10X Spatial Transcriptomics Data

Topics

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published