From cc1349a2f33a16a63d83de242c39de7b50044d5a Mon Sep 17 00:00:00 2001 From: Nima Rafati Date: Wed, 13 Mar 2024 08:33:12 +0100 Subject: [PATCH] Polish the slides --- slide_clustering.rmd | 12 ++++++------ slide_preprocessing.Rmd | 7 ++++--- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/slide_clustering.rmd b/slide_clustering.rmd index 180d02ad..2bce9b27 100644 --- a/slide_clustering.rmd +++ b/slide_clustering.rmd @@ -428,7 +428,7 @@ name: hclust - Well suited for hierarchical data (e.g. taxonomies). - Final output is a dendrogram representing the order decisions at each merge/division of clusters. - Two approaches: - - Agglomerative (Bottom-up): All data points are treated as clusters and the joins similar ones. + - Agglomerative (Bottom-up): All data points are treated as clusters and then joins similar ones. - Divisive (Top-down): All data points are in one large clusters and recursively splits the most heterogeneous clusters. - Number of clusters are decided after generating the tree. --- @@ -488,24 +488,24 @@ knitr::include_graphics('data/Linkages.png') name: linear-clustering-summary ## Summary -- For bulk RNASeq you can perform clusteirng on raw or Z-Score scaled data. +- For bulk RNASeq you can perform clustering on raw, Z-Score scaled data or on top PC coordinates. -- For the sample size is large (>10,000) you can perform clustering on PC. For instance in scRNASeq data. +- For the sample large size (>10,000) you can perform clustering on PC. For instance in scRNASeq data. - You always need to tune some parameters. - K-means performs poorly on unbalanced data. -- On hierarchical clustering, some distance metrics need to be used with a certain +- In hierarchical clustering, some distance metrics need to be used with a certain linkage method. - Checking clustering Robustness (a.k.a Ensemble perturbations): - Most clustering techniques will cluster random noise. - One way of testing this is by clustering on parts of the data (clustering bootstrapping) - - Read more in [Ronan et al (2016) Science Signaling](https://www.science.org/doi/10.1126/scisignal.aad1932?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed)). + - Read more in [Ronan et al (2016) Science Signaling](https://www.science.org/doi/10.1126/scisignal.aad1932?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed). --- name: Know more -## Do you want to know more +## Do you want to know more? Please check the following links: - [Avoiding common pitfalls when clustering biological data](https://www.science.org/doi/10.1126/scisignal.aad1932?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed) - [Clustering with Scikit with GIFs](https://dashee87.github.io/data%20science/general/Clustering-with-Scikit-with-GIFs/) (Note, this is based on python but provide nice illustration). diff --git a/slide_preprocessing.Rmd b/slide_preprocessing.Rmd index 2755b9a3..3bf1dc74 100644 --- a/slide_preprocessing.Rmd +++ b/slide_preprocessing.Rmd @@ -72,8 +72,8 @@ name: pp - Remove genes and samples with low counts ```{r,echo=TRUE} -cf1 <- cr[rowSums(cr>0) >= 3, ] # Keep rows/genes that have at least one read -cf2 <- cr[rowSums(cr>2) >= 3, ] # Keep rows/genes that have at least three reads +cf1 <- cr[rowSums(cr>0) >= 3, ] # Keep rows/genes that have at least one read in +3 samples +cf2 <- cr[rowSums(cr>3) >= 3, ] # Keep rows/genes that have at least three reads in +3 samples cf3 <- cr[rowSums(edgeR::cpm(cr)>5) >= 3, ] # need at least three samples to have cpm > 5. ``` _count/read per million (cpm/rpm): a normalized value for sequencing depth._ @@ -146,6 +146,7 @@ name: norm-1 .pull-left-50[ - Removing technical biases in sequencing data (e.g. sequencing depth and gene length) +- Make counts comparable across features (genes). - Make counts comparable across samples @@ -202,7 +203,7 @@ name: norm-2 ## Normalisation -- Make counts comparable across features (genes). It can be useful for gene to gene comparisons. +- Controlling for gene length: It can be useful for gene to gene comparisons. .size-60[![](data/normalization_methods_length.png)] ```{r,echo=FALSE}