From cc1349a2f33a16a63d83de242c39de7b50044d5a Mon Sep 17 00:00:00 2001
From: Nima Rafati <nimarafati@gmail.com>
Date: Wed, 13 Mar 2024 08:33:12 +0100
Subject: [PATCH] Polish the slides

---
 slide_clustering.rmd    | 12 ++++++------
 slide_preprocessing.Rmd |  7 ++++---
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/slide_clustering.rmd b/slide_clustering.rmd
index 180d02ad..2bce9b27 100644
--- a/slide_clustering.rmd
+++ b/slide_clustering.rmd
@@ -428,7 +428,7 @@ name: hclust
 - Well suited for hierarchical data (e.g. taxonomies). 
 - Final output is a dendrogram representing the order decisions at each merge/division of clusters.  
 - Two approaches:
-  - Agglomerative (Bottom-up): All data points are treated as clusters and the joins similar ones. 
+  - Agglomerative (Bottom-up): All data points are treated as clusters and then joins similar ones. 
   - Divisive (Top-down): All data points are in one large clusters and recursively splits the most heterogeneous clusters.  
 - Number of clusters are decided after generating the tree.  
 ---
@@ -488,24 +488,24 @@ knitr::include_graphics('data/Linkages.png')
 name: linear-clustering-summary
 ## Summary 
 
-- For bulk RNASeq you can perform clusteirng on raw or Z-Score scaled data. 
+- For bulk RNASeq you can perform clustering on raw, Z-Score scaled data or on top PC coordinates. 
 
-- For the sample size is large (>10,000) you can perform clustering on PC. For instance in scRNASeq data. 
+- For the sample large size (>10,000) you can perform clustering on PC. For instance in scRNASeq data. 
 
 - You always need to tune some parameters.  
 
 - K-means performs poorly on unbalanced data. 
 
-- On hierarchical clustering, some distance metrics need to be used with a certain
+- In hierarchical clustering, some distance metrics need to be used with a certain
 linkage method.  
 
 - Checking clustering Robustness (a.k.a  Ensemble perturbations):
     - Most clustering techniques will cluster random noise.  
     - One way of testing this is by clustering on parts of the data (clustering bootstrapping)
-    - Read more in [Ronan et al (2016) Science Signaling](https://www.science.org/doi/10.1126/scisignal.aad1932?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed)).  
+    - Read more in [Ronan et al (2016) Science Signaling](https://www.science.org/doi/10.1126/scisignal.aad1932?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed).  
 ---
 name: Know more
-## Do you want to know more
+## Do you want to know more?  
 Please check the following links:
 - [Avoiding common pitfalls when clustering biological data](https://www.science.org/doi/10.1126/scisignal.aad1932?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed)
 - [Clustering with Scikit with GIFs](https://dashee87.github.io/data%20science/general/Clustering-with-Scikit-with-GIFs/) (Note, this is based on python but provide nice illustration).  
diff --git a/slide_preprocessing.Rmd b/slide_preprocessing.Rmd
index 2755b9a3..3bf1dc74 100644
--- a/slide_preprocessing.Rmd
+++ b/slide_preprocessing.Rmd
@@ -72,8 +72,8 @@ name: pp
 - Remove genes and samples with low counts
 
 ```{r,echo=TRUE}
-cf1 <- cr[rowSums(cr>0) >= 3, ] # Keep rows/genes that have at least one read 
-cf2 <- cr[rowSums(cr>2) >= 3, ] # Keep rows/genes that have at least three reads 
+cf1 <- cr[rowSums(cr>0) >= 3, ] # Keep rows/genes that have at least one read in +3 samples 
+cf2 <- cr[rowSums(cr>3) >= 3, ] # Keep rows/genes that have at least three reads in +3 samples 
 cf3 <- cr[rowSums(edgeR::cpm(cr)>5) >= 3, ] # need at least  three samples to have cpm > 5. 
 ```
 _count/read per million (cpm/rpm): a normalized value for sequencing depth._  
@@ -146,6 +146,7 @@ name: norm-1
 
 .pull-left-50[
 - Removing technical biases in sequencing data (e.g. sequencing depth and gene length)
+- Make counts comparable across features (genes). 
 - Make counts comparable across samples
 
 <!-- Control for sequencing depth -->
@@ -202,7 +203,7 @@ name: norm-2
 
 ## Normalisation
 
-- Make counts comparable across features (genes). It can be useful for gene to gene comparisons.
+- Controlling for gene length: It can be useful for gene to gene comparisons.
 .size-60[![](data/normalization_methods_length.png)]
 
 ```{r,echo=FALSE}