Answer key - Clustering workflow

Identification of all markers for each cluster

Find all markers for the seurat_stim data using the FindAllMarkers() function with arguments to return only the positive markers and those markers with log2 fold-change greater than 0.25. Do not run the code below.

## DO NOT RUN THIS CODE ##
# Find markers for every cluster compared to all remaining cells, report only the positive ones
markers_stim <- FindAllMarkers(object = seurat_stim, 
                          only.pos = TRUE,
                          logfc.threshold = 0.25)

In your own analysis you can run the code above, but it can take quite long to run. As such we have created the markers data frame for you to download and load into your R session.

markers_stim <- read.csv("results/stim_all_markers.csv")

Annotate the markers with gene descriptions

The order of the columns doesn't seem the most intuitive, so we will reorder the columns with the cluster first followed by the gene.

# Combine markers with gene descriptions 
ann_markers_stim <- inner_join(x = markers_stim, 
                          y = annotations[, c("gene_name", "description")],
                          by = c("gene" = "gene_name")) %>%
        unique()

Reorder the columns to be in the order shown below

# Rearrange the columns to be more intuitive
ann_markers_stim <- ann_markers_stim[ , c(6, 7, 2:4, 1, 5,8)]

Arrange rows by cluster, then by p-adjusted values

# Order the rows by p-adjusted values
ann_markers_stim <- ann_markers_stim %>%
        dplyr::arrange(cluster, p_val_adj)

# Take a look a the new data frame
View(ann_markers_stim)

Save our results to file

Save our rearranged marker analysis results to a file called stim_all_markers_annotated.csv in the results folder.

# Save markers to file
write.csv(ann_markers_stim, 
          file = "results/stim_all_markers_annotated.csv", 
          quote = FALSE, 
          row.names = FALSE)

Extract the top 5 markers by log2 fold change for each cluster.

# Extract top 5 markers per cluster
top5_stim <- ann_markers_stim %>% 
        group_by(cluster) %>% 
        top_n(n = 5, 
              wt = avg_logFC)

# Visualize top 5 markers per cluster
View(top5_stim)

Visualize top 5 markers for cluster 0

To get a better idea of cell type identity we can explore the expression of different identified markers by cluster using the FeaturePlot() function. For example, we can look at the cluster 0 markers:

# Plot top 5 markers for cluster 0
FeaturePlot(object = seurat_stim, 
            features = top5_stim[top5_stim$cluster == 0, "gene"] %>%
                    pull(gene))

Violin plot

# Vln plot - cluster 0
VlnPlot(object = seurat_stim, 
        features = top5_stim[top5_stim$cluster == 0, "gene"] %>%
                    pull(gene))

This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

A portion of these materials and hands-on activities were adapted from the Satija Lab's Seurat - Guided Clustering Tutorial

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sc_exercises_marker_analysis.md

sc_exercises_marker_analysis.md

Answer key - Clustering workflow

Identification of all markers for each cluster

Annotate the markers with gene descriptions

Reorder the columns to be in the order shown below

Arrange rows by cluster, then by p-adjusted values

Save our results to file

Extract the top 5 markers by log2 fold change for each cluster.

Visualize top 5 markers for cluster 0

Violin plot

Files

sc_exercises_marker_analysis.md

Latest commit

History

sc_exercises_marker_analysis.md

File metadata and controls

Answer key - Clustering workflow

Identification of all markers for each cluster

Annotate the markers with gene descriptions

Reorder the columns to be in the order shown below

Arrange rows by cluster, then by p-adjusted values

Save our results to file

Extract the top 5 markers by log2 fold change for each cluster.

Visualize top 5 markers for cluster 0

Violin plot