From 562cb1dbd0968bde54dcaa4bc0bbbb7dc4b98a14 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Mon, 26 Feb 2024 15:38:41 +0000
Subject: [PATCH 01/22] Create pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 28 +++++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100644 docs/yaml_docs/pipeline_clustering_yml.md
diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
new file mode 100644
index 00000000..6a2bf8d7
--- /dev/null
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -0,0 +1,28 @@
+<style>
+  .parameter {
+    border-top: 4px solid lightblue;
+    background-color: rgba(173, 216, 230, 0.2);
+    padding: 4px;
+    display: inline-block;
+    font-weight: bold;
+  }
+</style>
+
+# Clustering YAML 
+
+In this documentation, the parameters of the `clustering` configuration yaml file are explained.
+This file is generated running `panpipes clustering config`. <br>
+The individual steps run by the pipeline are described in [clustering worlfow](https://panpipes-pipelines.readthedocs.io/en/latest/workflows/clustering.html)
+
+When running the clustering workflow, panpipes provides a basic `pipeline.yml` file.
+To run the workflow on your own data, you need to specify the parameters described below in the `pipeline.yml` file to meet the requirements of your data.
+
+However, we do provide pre-filled versions of the `pipeline.yml` file for individual [tutorials](https://panpipes-pipelines.readthedocs.io/en/latest/tutorials/index.html).
+For more information on functionalities implemented in `panpipes` to read the configuration files, such as reading blocks of parameters and reusing blocks with  `&anchors` and `*scalars`, please check [our documentation](./useful_info_on_yml.md)
+
+You can download the different clustering pipeline.yml files here:
+- Basic `pipeline.yml` file (not prefilled) that is generated when calling `panpipes clustering config`: [Download here](https://github.com/DendrouLab/panpipes/blob/main/panpipes/panpipes/pipeline_clustering/pipeline.yml)
+- `pipeline.yml` for [Clustering Tutorial](https://panpipes-tutorials.readthedocs.io/en/latest/_downloads/3895aa0ba60017b15ee1aa6531dc8c25/pipeline.ym)
+
+## Compute resources options
+

From 8ff37a385f9929e041cc69b910df8ed8382efb46 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Tue, 27 Feb 2024 13:57:42 +0000
Subject: [PATCH 02/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 6a2bf8d7..28fc5298 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -26,3 +26,22 @@ You can download the different clustering pipeline.yml files here:
 
 ## Compute resources options
 
+- <span class="parameter">resources</span><br>
+Computing resources to use, specifically the number of threads used for parallel jobs.
+Specified by the following three parameters:
+  - <span class="parameter">threads_high</span> `Integer`, Default: 2<br>
+        Number of threads used for high intensity computing tasks. 
+        For each thread, there must be enough memory to load all your input files at once and create the MuData object.
+
+  - <span class="parameter">threads_medium</span> `Integer`, Default: 2<br>
+        Number of threads used for medium intensity computing tasks.
+        For each thread, there must be enough memory to load your mudata and do computationally light tasks.
+
+  - <span class="parameter">threads_low</span> `Integer`, Default: 2<br>
+  	    Number of threads used for low intensity computing tasks.
+        For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.
+  - <span class="parameter">fewer_jobs</span> `Boolean`, Default: True<br>
+  
+  - <span class="parameter">condaenv</span> `String` (Path)<br>
+    Path to conda environment that should be used to run panpipes.
+    Leave blank if running native or your cluster automatically inherits the login node environment

From 66339f19100f5c6c15c9a128ca8f4901de5db002 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Tue, 27 Feb 2024 14:07:21 +0000
Subject: [PATCH 03/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 28fc5298..569ac974 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -45,3 +45,22 @@ Specified by the following three parameters:
   - <span class="parameter">condaenv</span> `String` (Path)<br>
     Path to conda environment that should be used to run panpipes.
     Leave blank if running native or your cluster automatically inherits the login node environment
+
+## Loading and merging data options
+### Data format
+
+- <span class="parameter">sample_prefix</span> `String`, Mandatory parameter, Default: mdata<br>
+Prefix for the sample that comes out of the filtering/ preprocessing steps of the workflow.
+
+
+- <span class="parameter">scaled_obj</span> `String`, Mandatory parameter, Default: mdata_scaled.h5mu<br>
+ Path to the output file from preprocessing (e.g. `../preprocess/mdata_scaled.h5mu`).
+ Ensure that the submission file must be in the right format and that the right path is provided. In this case panpipes will use the full object to calculate rank_gene_groups and for plotting those genes. If your scaled_obj contains all the genes then leave full_obj blank
+
+- <span class="parameter">full_obj</span> `String`, Default: <br>
+
+- <span class="parameter">modalities</span><br>
+  - <span class="parameter">rna</span> `Boolean`, Default: True<br>
+  - <span class="parameter">prot</span> `Boolean`, Default: True<br>
+  - <span class="parameter">atac</span> `Boolean`, Default: False<br>
+  - <span class="parameter">spatial</span> `Boolean`, Default: False<br>

From 848b309be248ba39931533fb69a296dfaba1f126 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Tue, 27 Feb 2024 14:22:41 +0000
Subject: [PATCH 04/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 569ac974..3234542b 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -64,3 +64,11 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
   - <span class="parameter">prot</span> `Boolean`, Default: True<br>
   - <span class="parameter">atac</span> `Boolean`, Default: False<br>
   - <span class="parameter">spatial</span> `Boolean`, Default: False<br>
+  Run clustering on each individual modality.
+
+- <span class="parameter">moltimodal</span><br>
+  - <span class="parameter">rna_clustering</span> `Boolean`, Default: True<br>
+  - <span class="parameter">integration_method</span> `String`, Default: WNN<br>
+  Options here include WNN, moda, and totalVI, and it tells us where to look for.
+
+### Parameters for finding neighbours 

From d9f6b91c1c00e7f078d5d8775aac8c4699adc659 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Tue, 27 Feb 2024 15:29:50 +0000
Subject: [PATCH 05/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 81 ++++++++++++++++++++++-
 1 file changed, 78 insertions(+), 3 deletions(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 3234542b..e9a4d1cd 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -46,7 +46,7 @@ Specified by the following three parameters:
     Path to conda environment that should be used to run panpipes.
     Leave blank if running native or your cluster automatically inherits the login node environment
 
-## Loading and merging data options
+## Loading data 
 ### Data format
 
 - <span class="parameter">sample_prefix</span> `String`, Mandatory parameter, Default: mdata<br>
@@ -55,7 +55,7 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
 
 - <span class="parameter">scaled_obj</span> `String`, Mandatory parameter, Default: mdata_scaled.h5mu<br>
  Path to the output file from preprocessing (e.g. `../preprocess/mdata_scaled.h5mu`).
- Ensure that the submission file must be in the right format and that the right path is provided. In this case panpipes will use the full object to calculate rank_gene_groups and for plotting those genes. If your scaled_obj contains all the genes then leave full_obj blank
+ Ensure that the submission file must be in the right format and that the right path is provided. In this case, panpipes will use the full object to calculate rank_gene_groups and for plotting those genes. If your scaled_obj contains all the genes then leave full_obj blank
 
 - <span class="parameter">full_obj</span> `String`, Default: <br>
 
@@ -71,4 +71,79 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
   - <span class="parameter">integration_method</span> `String`, Default: WNN<br>
   Options here include WNN, moda, and totalVI, and it tells us where to look for.
 
-### Parameters for finding neighbours 
+## Parameters for finding neighbours 
+
+- <span class="parameter">neighors:</span> 
+ Sets the number of neighbors to use when calculating the graph for clustering and umap.
+  - <span class="parameter">rna:</span> 
+
+     - <span class="parameter">use_existing </span> `Boolean`, Default: True<br>
+     - <span class="parameter">dim_red </span> `String`, Default: X_pca<br>
+       Defines which representation in .obsm to use for nearest neighbors
+     - <span class="parameter">n_dim_red</span> `Integer`, Default: 30<br>
+       Number of components to use for clustering
+     - <span class="parameter">k</span> `Integer`, Default: 30<br>
+       Number of neighbours
+     - <span class="parameter">metric</span> `String`, Default: euclidean<br>
+       Options here include euclidean and cosine
+     - <span class="parameter">method</span> `String`, Default: scanpy<br>
+       Options include scanpy and hnsw (from scvelo)
+      
+     
+  - <span class="parameter">prot:</span> 
+
+     - <span class="parameter">use_existing </span> `Boolean`, Default: True<br>
+     - <span class="parameter">dim_red </span> `String`, Default: X_pca<br>
+       Defines which representation in .obsm to use for nearest neighbors
+     - <span class="parameter">n_dim_red</span> `Integer`, Default: 30<br>
+       Number of components to use for clustering
+     - <span class="parameter">k</span> `Integer`, Default: 30<br>
+       Number of neighbours
+     - <span class="parameter">metric</span> `String`, Default: euclidean<br>
+       Options here include euclidean and cosine
+     - <span class="parameter">method</span> `String`, Default: scanpy<br>
+       Options include scanpy and hnsw (from scvelo)
+
+
+  - <span class="parameter">atac:</span> 
+
+     - <span class="parameter">use_existing </span> `Boolean`, Default: True<br>
+     - <span class="parameter">dim_red </span> `String`, Default: X_lsi<br>
+       Defines which representation in .obsm to use for nearest neighbors
+     - <span class="parameter">n_dim_red</span> `Integer`, Default: 1<br>
+       Number of components to use for clustering
+     - <span class="parameter">k</span> `Integer`, Default: 30<br>
+       Number of neighbours
+     - <span class="parameter">metric</span> `String`, Default: euclidean<br>
+       Options here include euclidean and cosine
+     - <span class="parameter">method</span> `String`, Default: scanpy<br>
+       Options include scanpy and hnsw (from scvelo)
+  
+
+
+  - <span class="parameter">spatial:</span> 
+
+     - <span class="parameter">use_existing </span> `Boolean`, Default: False<br>
+     - <span class="parameter">dim_red </span> `String`, Default: X_pca<br>
+       Defines which representation in .obsm to use for nearest neighbors
+     - <span class="parameter">n_dim_red</span> `Integer`, Default: 30<br>
+       Number of components to use for clustering
+     - <span class="parameter">k</span> `Integer`, Default: 30<br>
+       Number of neighbours
+     - <span class="parameter">metric</span> `String`, Default: euclidean<br>
+       Options here include euclidean and cosine
+     - <span class="parameter">method</span> `String`, Default: scanpy<br>
+       Options include scanpy and hnsw (from scvelo)
+  
+## Parameters for umap calculation 
+
+
+  - <span class="parameter">umap:</span> 
+
+     - <span class="parameter">run </span> `Boolean`, Default: True<br>
+     - <span class="parameter">rna:</span>
+         - <span class="parameter">mindist </span> `Float`, Default: 0.25  0.5<br> (AAAAAAAAAASSSSSSKKKKKKK) 
+         
+
+
+

From 42de436fa0dd1d72451ba9374d8c1e67ea41cfad Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Tue, 27 Feb 2024 15:53:34 +0000
Subject: [PATCH 06/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index e9a4d1cd..49fcc827 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -142,8 +142,18 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
 
      - <span class="parameter">run </span> `Boolean`, Default: True<br>
      - <span class="parameter">rna:</span>
-         - <span class="parameter">mindist </span> `Float`, Default: 0.25  0.5<br> (AAAAAAAAAASSSSSSKKKKKKK) 
-         
+         - <span class="parameter">mindist </span> `Float`, Default: 0.25  0.5<br>
+           Use both values as defaults. 
+      - <span class="parameter">prot:</span>
+         - <span class="parameter">mindist </span> `Float`, Default: 0.1<br>
+      - <span class="parameter">atac:</span>
+         - <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
+      - <span class="parameter">multimodal:</span>
+         - <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
+      - <span class="parameter">rna:</span>
+         - <span class="parameter">mindist </span> `Float`, Default: 0.25  0.5<br>
+            Use both values as defaults. 
+
 
 
 

From 0053fc28dd50e6efd2bd876e474e8cc2ab89ce93 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Tue, 27 Feb 2024 16:02:07 +0000
Subject: [PATCH 07/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 32 +++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 49fcc827..0c46b81b 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -142,7 +142,7 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
 
      - <span class="parameter">run </span> `Boolean`, Default: True<br>
      - <span class="parameter">rna:</span>
-         - <span class="parameter">mindist </span> `Float`, Default: 0.25  0.5<br>
+         - <span class="parameter">mindist </span> `Float`, Default: 0.25,  0.5<br>
            Use both values as defaults. 
       - <span class="parameter">prot:</span>
          - <span class="parameter">mindist </span> `Float`, Default: 0.1<br>
@@ -151,9 +151,37 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
       - <span class="parameter">multimodal:</span>
          - <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
       - <span class="parameter">rna:</span>
-         - <span class="parameter">mindist </span> `Float`, Default: 0.25  0.5<br>
+         - <span class="parameter">mindist </span> `Float`, Default: 0.25,  0.5<br>
             Use both values as defaults. 
 
+## Parameters for clustering 
 
+  - <span class="parameter">clusterspecs:</span>
+      - <span class="parameter">rna:</span>
+          - <span class="parameter">resolutions </span> `Float`, Default: 0.2, 0.6, 1<br>
+           Use all values as defaults.
+          - <span class="parameter">algorithm</span> `String`, Default: leiden<br>
+            Options include louvain or leiden. 
+      - <span class="parameter">prot:</span>
+          - <span class="parameter">resolutions </span> `Float`, Default: 0.2, 0.6, 1<br>
+           Use all values as defaults.
+          - <span class="parameter">algorithm</span> `String`, Default: leiden<br>
+            Options include louvain or leiden.
 
+      - <span class="parameter">atac:</span>
+          - <span class="parameter">resolutions </span> `Float`, Default: 0.2, 0.6, 1<br>
+           Use all values as defaults.
+          - <span class="parameter">algorithm</span> `String`, Default: leiden<br>
+            Options include louvain or leiden. 
+      - <span class="parameter">multimmodal:</span>
+          - <span class="parameter">resolutions </span> `Float`, Default: 0.5, 0.7<br>
+           Use all values as defaults.
+          - <span class="parameter">algorithm</span> `String`, Default: leiden<br>
+            Options include louvain or leiden.
+
+      - <span class="parameter">spatial:</span>
+          - <span class="parameter">resolutions </span> `Float`, Default: 0.2, 0.6, 1<br>
+           Use all values as defaults.
+          - <span class="parameter">algorithm</span> `String`, Default: leiden<br>
+            Options include louvain or leiden. 
 

From 9100c9b7685c8b6cf4099874722f5ddd4d731d41 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Tue, 27 Feb 2024 16:39:58 +0000
Subject: [PATCH 08/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 81 +++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 0c46b81b..318f039c 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -185,3 +185,84 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
           - <span class="parameter">algorithm</span> `String`, Default: leiden<br>
             Options include louvain or leiden. 
 
+## Parameters for finding marker genes 
+If pseudo_suerat is set to false then we run [scanpy](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html). 
+When pseudo_seurat is set to true then a python implementation of Suerat runs (Seurat::FindMarkers written by CRG)
+
+  - <span class="parameter">markerspecs:</span> <br>
+     - <span class="parameter">rna:</span><br>
+       - <span class="parameter">run </span> `Boolean`, Default: True<br>
+       - <span class="parameter">layer </span> `String`, Default: logged_counts<br>
+       - <span class="parameter">method </span> `String`, Default: t-test_overestim_var<br>
+       Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
+       - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
+       If the number of clusters contains less than the number of cells maker analysis is not necessary.
+       - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
+       - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
+       This parameter only matters if pseudo_seurat is set to True 
+       - <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
+       This parameter only matters if pseudo_seurat is set to True 
+
+ - <span class="parameter">prot:</span><br>
+   - <span class="parameter">run </span> `Boolean`, Default: True<br>
+   - <span class="parameter">layer </span> `String`, Default: clr<br>
+       Options include clr and dsb. 
+   - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
+       If the number of clusters contains less than the number of cells maker analysis is not necessary.
+   - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
+   - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
+   - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
+       This parameter only matters if pseudo_seurat is set to True 
+   - <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
+       This parameter only matters if pseudo_seurat is set to True 
+
+ - <span class="parameter">atac:</span><br>
+    - <span class="parameter">run </span> `Boolean`, Default: False<br>
+    - <span class="parameter">layer </span> `String`, Default: logged_counts<br>
+       Options include logged_counts, signac_norm , and logTF_norm,logIDF_norm
+    - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
+       If the number of clusters contains less than the number of cells maker analysis is not necessary.
+    - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
+        Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
+    - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
+    - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
+       This parameter only matters if pseudo_seurat is set to True 
+    - <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
+       This parameter only matters if pseudo_seurat is set to True 
+
+
+ - <span class="parameter">multimodal:</span><br>
+   - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
+       If the number of clusters contains less than the number of cells maker analysis is not necessary.
+    - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
+        Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
+    - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
+    - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
+       This parameter only matters if pseudo_seurat is set to True 
+    - <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
+       This parameter only matters if pseudo_seurat is set to True
+
+
+ - <span class="parameter">spatial:</span><br>
+   - <span class="parameter">run </span> `Boolean`, Default: True<br>
+   - <span class="parameter">layer </span> `String`, Default: norm_pearson_resid<br>
+       Options include logged_counts, signac_norm , and logTF_norm,logIDF_norm
+   - <span class="parameter">method </span> `String`, Default: t-test_overestim_var<br>
+        Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
+   - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
+       If the number of clusters contains less than the number of cells maker analysis is not necessary.
+   - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
+   - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
+       This parameter only matters if pseudo_seurat is set to True 
+   - <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
+       This parameter only matters if pseudo_seurat is set to True
+
+## Plot specifications
+Used to define which metadata columns are used in the visualisations 
+ - <span class="parameter">plotspecs:</span><br>
+   - <span class="parameter">layers: </span><br>
+     - <span class="parameter">rna </span> `String`, Default: logged_counts<br>
+     - <span class="parameter">prot </span> `String`, Default: clr<br>
+     - <span class="parameter">atac </span> `String`, Default: signac_norm<br>
+     - <span class="parameter">spacial </span> `?`, Default: ?<br> (CHEEECKKKKK)
+  - <span class="parameter">top_n_markers </span> `Integer`, Default: 10<br>

From 6ad1bf022e267231db1b9cb8346ebdddc26185db Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Wed, 28 Feb 2024 10:41:03 +0000
Subject: [PATCH 09/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 318f039c..79252630 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -8,7 +8,7 @@
   }
 </style>
 
-# Clustering YAML 
+# Clustering YAML  
 
 In this documentation, the parameters of the `clustering` configuration yaml file are explained.
 This file is generated running `panpipes clustering config`. <br>
@@ -264,5 +264,5 @@ Used to define which metadata columns are used in the visualisations
      - <span class="parameter">rna </span> `String`, Default: logged_counts<br>
      - <span class="parameter">prot </span> `String`, Default: clr<br>
      - <span class="parameter">atac </span> `String`, Default: signac_norm<br>
-     - <span class="parameter">spacial </span> `?`, Default: ?<br> (CHEEECKKKKK)
+     - <span class="parameter">spacial </span> `?`, Default: ?<br> 
   - <span class="parameter">top_n_markers </span> `Integer`, Default: 10<br>

From 46ce807678cff246962f908d2f5c2cf5dd0f0653 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Wed, 28 Feb 2024 13:22:33 +0100
Subject: [PATCH 10/22] added to index to review

---
 docs/yaml_docs/index.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/yaml_docs/index.rst b/docs/yaml_docs/index.rst
index ad028376..3fd86bc8 100644
--- a/docs/yaml_docs/index.rst
+++ b/docs/yaml_docs/index.rst
@@ -7,6 +7,7 @@ Workflows configuration files
     useful_info_on_yml
     pipeline_ingestion_yml
     pipeline_integration_yml
+    pipeline_clustering_yml
     spatial_qc
     spatial_preprocess
     spatial_deconvolution
\ No newline at end of file

From 94659f869209b216506e4ef1faa926b467a92fc6 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Wed, 28 Feb 2024 14:17:39 +0100
Subject: [PATCH 11/22] fixes

---
 docs/yaml_docs/pipeline_clustering_yml.md | 37 +++++++++++++----------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 79252630..9bf3a656 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -18,11 +18,12 @@ When running the clustering workflow, panpipes provides a basic `pipeline.yml` f
 To run the workflow on your own data, you need to specify the parameters described below in the `pipeline.yml` file to meet the requirements of your data.
 
 However, we do provide pre-filled versions of the `pipeline.yml` file for individual [tutorials](https://panpipes-pipelines.readthedocs.io/en/latest/tutorials/index.html).
+
 For more information on functionalities implemented in `panpipes` to read the configuration files, such as reading blocks of parameters and reusing blocks with  `&anchors` and `*scalars`, please check [our documentation](./useful_info_on_yml.md)
 
 You can download the different clustering pipeline.yml files here:
 - Basic `pipeline.yml` file (not prefilled) that is generated when calling `panpipes clustering config`: [Download here](https://github.com/DendrouLab/panpipes/blob/main/panpipes/panpipes/pipeline_clustering/pipeline.yml)
-- `pipeline.yml` for [Clustering Tutorial](https://panpipes-tutorials.readthedocs.io/en/latest/_downloads/3895aa0ba60017b15ee1aa6531dc8c25/pipeline.ym)
+- `pipeline.yml` for [Clustering Tutorial](https://panpipes-tutorials.readthedocs.io/en/latest/_downloads/3895aa0ba60017b15ee1aa6531dc8c25/pipeline.yml)
 
 ## Compute resources options
 
@@ -55,10 +56,11 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
 
 - <span class="parameter">scaled_obj</span> `String`, Mandatory parameter, Default: mdata_scaled.h5mu<br>
  Path to the output file from preprocessing (e.g. `../preprocess/mdata_scaled.h5mu`).
- Ensure that the submission file must be in the right format and that the right path is provided. In this case, panpipes will use the full object to calculate rank_gene_groups and for plotting those genes. If your scaled_obj contains all the genes then leave full_obj blank
+ Ensure that the path to the file is correct.  
 
 - <span class="parameter">full_obj</span> `String`, Default: <br>
-
+  Speficy the full object if your scaled_obj contains only HVG.  If your scaled_obj contains all the genes then leave full_obj blank. 
+  panpipes will use the full object to do marker genes analysis (rank_gene_groups) and for plotting those genes. 
 - <span class="parameter">modalities</span><br>
   - <span class="parameter">rna</span> `Boolean`, Default: True<br>
   - <span class="parameter">prot</span> `Boolean`, Default: True<br>
@@ -142,46 +144,49 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
 
      - <span class="parameter">run </span> `Boolean`, Default: True<br>
      - <span class="parameter">rna:</span>
-         - <span class="parameter">mindist </span> `Float`, Default: 0.25,  0.5<br>
-           Use both values as defaults. 
+         - <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
+           Can specify an array: 0.25,0.5
       - <span class="parameter">prot:</span>
-         - <span class="parameter">mindist </span> `Float`, Default: 0.1<br>
+         - <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
+           Can specify an array: 0.25,0.5,0.8
       - <span class="parameter">atac:</span>
          - <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
+           Can specify an array: 0.25,0.5,0.8
       - <span class="parameter">multimodal:</span>
          - <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
+           Can specify an array: 0.25,0.5,0.8
       - <span class="parameter">rna:</span>
-         - <span class="parameter">mindist </span> `Float`, Default: 0.25,  0.5<br>
-            Use both values as defaults. 
+         - <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
+            Can specify an array: 0.25,0.5,0.8
 
 ## Parameters for clustering 
 
   - <span class="parameter">clusterspecs:</span>
       - <span class="parameter">rna:</span>
           - <span class="parameter">resolutions </span> `Float`, Default: 0.2, 0.6, 1<br>
-           Use all values as defaults.
+           Can specify an array: 0.2,0.6,1
           - <span class="parameter">algorithm</span> `String`, Default: leiden<br>
             Options include louvain or leiden. 
       - <span class="parameter">prot:</span>
           - <span class="parameter">resolutions </span> `Float`, Default: 0.2, 0.6, 1<br>
-           Use all values as defaults.
+           Can specify an array: 0.2,0.6,1
           - <span class="parameter">algorithm</span> `String`, Default: leiden<br>
             Options include louvain or leiden.
 
       - <span class="parameter">atac:</span>
           - <span class="parameter">resolutions </span> `Float`, Default: 0.2, 0.6, 1<br>
-           Use all values as defaults.
+           Can specify an array to compute in parallel: 0.2,0.6,1
           - <span class="parameter">algorithm</span> `String`, Default: leiden<br>
             Options include louvain or leiden. 
       - <span class="parameter">multimmodal:</span>
           - <span class="parameter">resolutions </span> `Float`, Default: 0.5, 0.7<br>
-           Use all values as defaults.
+           Can specify an array to compute in parallel: 0.2,0.6,1 
           - <span class="parameter">algorithm</span> `String`, Default: leiden<br>
             Options include louvain or leiden.
 
       - <span class="parameter">spatial:</span>
           - <span class="parameter">resolutions </span> `Float`, Default: 0.2, 0.6, 1<br>
-           Use all values as defaults.
+           Can specify an array to compute in parallel: 0.2,0.6,1 
           - <span class="parameter">algorithm</span> `String`, Default: leiden<br>
             Options include louvain or leiden. 
 
@@ -206,15 +211,15 @@ When pseudo_seurat is set to true then a python implementation of Suerat runs (S
  - <span class="parameter">prot:</span><br>
    - <span class="parameter">run </span> `Boolean`, Default: True<br>
    - <span class="parameter">layer </span> `String`, Default: clr<br>
-       Options include clr and dsb. 
+       Can specify an array to compute in parallel: clr, dsb
    - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
        If the number of clusters contains less than the number of cells maker analysis is not necessary.
    - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
    - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
    - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
-       This parameter only matters if pseudo_seurat is set to True 
+       This parameter is mandatory if pseudo_seurat is set to True 
    - <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
-       This parameter only matters if pseudo_seurat is set to True 
+       This parameter is mandatory if pseudo_seurat is set to True 
 
  - <span class="parameter">atac:</span><br>
     - <span class="parameter">run </span> `Boolean`, Default: False<br>

From 07df95220dc71b2ffe4750dd9825251c620c49a9 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Wed, 28 Feb 2024 14:18:45 +0100
Subject: [PATCH 12/22] typo

---
 docs/yaml_docs/pipeline_clustering_yml.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 9bf3a656..d388d0c5 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -269,5 +269,5 @@ Used to define which metadata columns are used in the visualisations
      - <span class="parameter">rna </span> `String`, Default: logged_counts<br>
      - <span class="parameter">prot </span> `String`, Default: clr<br>
      - <span class="parameter">atac </span> `String`, Default: signac_norm<br>
-     - <span class="parameter">spacial </span> `?`, Default: ?<br> 
+     - <span class="parameter">spatial </span> `?`, Default: ?<br> 
   - <span class="parameter">top_n_markers </span> `Integer`, Default: 10<br>

From 2ded3937602ec33bacf1309c59b3e2e30ab19bf4 Mon Sep 17 00:00:00 2001
From: bio-la <fabiola.curion@gmail.com>
Date: Wed, 28 Feb 2024 14:25:52 +0100
Subject: [PATCH 13/22] fixes

---
 docs/yaml_docs/pipeline_clustering_yml.md | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index d388d0c5..00d6df47 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -191,13 +191,16 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
             Options include louvain or leiden. 
 
 ## Parameters for finding marker genes 
-If pseudo_suerat is set to false then we run [scanpy](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html). 
-When pseudo_seurat is set to true then a python implementation of Suerat runs (Seurat::FindMarkers written by CRG)
+
+In this part of the analysis we define parameters to run marker analysis. 
+By default, pseudo_seurat is set to False, and we run [scanpy.tl.rank_genes_groups](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html). 
+When pseudo_seurat is set to True then a [python implementation](https://github.com/DendrouLab/panpipes/blob/main/panpipes/python_scripts/run_find_markers_multi.py) of `Seurat:::FindMarkers` is run
 
   - <span class="parameter">markerspecs:</span> <br>
      - <span class="parameter">rna:</span><br>
        - <span class="parameter">run </span> `Boolean`, Default: True<br>
        - <span class="parameter">layer </span> `String`, Default: logged_counts<br>
+         Which layer stores counts for differential expression test.
        - <span class="parameter">method </span> `String`, Default: t-test_overestim_var<br>
        Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
        - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
@@ -211,7 +214,7 @@ When pseudo_seurat is set to true then a python implementation of Suerat runs (S
  - <span class="parameter">prot:</span><br>
    - <span class="parameter">run </span> `Boolean`, Default: True<br>
    - <span class="parameter">layer </span> `String`, Default: clr<br>
-       Can specify an array to compute in parallel: clr, dsb
+       Which layer stores counts for differential expression test.
    - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
        If the number of clusters contains less than the number of cells maker analysis is not necessary.
    - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
@@ -224,6 +227,7 @@ When pseudo_seurat is set to true then a python implementation of Suerat runs (S
  - <span class="parameter">atac:</span><br>
     - <span class="parameter">run </span> `Boolean`, Default: False<br>
     - <span class="parameter">layer </span> `String`, Default: logged_counts<br>
+      Which layer stores counts for differential expression test. 
        Options include logged_counts, signac_norm , and logTF_norm,logIDF_norm
     - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
        If the number of clusters contains less than the number of cells maker analysis is not necessary.
@@ -231,9 +235,9 @@ When pseudo_seurat is set to true then a python implementation of Suerat runs (S
         Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
     - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
     - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
-       This parameter only matters if pseudo_seurat is set to True 
+       This parameter is mandatory if pseudo_seurat is set to True 
     - <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
-       This parameter only matters if pseudo_seurat is set to True 
+       This parameter is mandatory if pseudo_seurat is set to True 
 
 
  - <span class="parameter">multimodal:</span><br>
@@ -243,9 +247,9 @@ When pseudo_seurat is set to true then a python implementation of Suerat runs (S
         Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
     - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
     - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
-       This parameter only matters if pseudo_seurat is set to True 
+       This parameter is mandatory if pseudo_seurat is set to True 
     - <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
-       This parameter only matters if pseudo_seurat is set to True
+       This parameter is mandatory if pseudo_seurat is set to True
 
 
  - <span class="parameter">spatial:</span><br>

From 268cb755b58b4e09ee7769858740459416a9fc46 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Wed, 28 Feb 2024 16:26:21 +0000
Subject: [PATCH 14/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 00d6df47..9b56938f 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -207,10 +207,9 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
        If the number of clusters contains less than the number of cells maker analysis is not necessary.
        - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
        - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
-       This parameter only matters if pseudo_seurat is set to True 
+       This parameter is mandatory if pseudo_seurat is set to True 
        - <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
-       This parameter only matters if pseudo_seurat is set to True 
-
+       TThis parameter is mandatory if pseudo_seurat is set to True 
  - <span class="parameter">prot:</span><br>
    - <span class="parameter">run </span> `Boolean`, Default: True<br>
    - <span class="parameter">layer </span> `String`, Default: clr<br>
@@ -258,14 +257,13 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
        Options include logged_counts, signac_norm , and logTF_norm,logIDF_norm
    - <span class="parameter">method </span> `String`, Default: t-test_overestim_var<br>
         Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
-   - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
+   - <span class="parameter">mincels </span> `Integer`, Default: 10<br>
        If the number of clusters contains less than the number of cells maker analysis is not necessary.
    - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
    - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
-       This parameter only matters if pseudo_seurat is set to True 
+      This parameter is mandatory if pseudo_seurat is set to True 
    - <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
-       This parameter only matters if pseudo_seurat is set to True
-
+       This parameter is mandatory if pseudo_seurat is set to True 
 ## Plot specifications
 Used to define which metadata columns are used in the visualisations 
  - <span class="parameter">plotspecs:</span><br>

From 8965d4b8af8516270f909e2a4a82c54a5c35cb82 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Mon, 4 Mar 2024 13:22:32 +0000
Subject: [PATCH 15/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 9b56938f..79d6d879 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -203,7 +203,7 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
          Which layer stores counts for differential expression test.
        - <span class="parameter">method </span> `String`, Default: t-test_overestim_var<br>
        Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
-       - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
+       - <span class="parameter">mincels </span> `Integer`, Default: 10<br>
        If the number of clusters contains less than the number of cells maker analysis is not necessary.
        - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
        - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>

From 48fbcd293f20bfb1c39d5f86ccc04f8d173fd526 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Mon, 4 Mar 2024 13:35:19 +0000
Subject: [PATCH 16/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 79d6d879..ac0820ed 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -5,7 +5,7 @@
     padding: 4px;
     display: inline-block;
     font-weight: bold;
-  }
+  } 
 </style>
 
 # Clustering YAML  

From 27d0bfa292031dbd0fe7a90c374ad4857f9856c6 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Tue, 5 Mar 2024 11:31:51 +0000
Subject: [PATCH 17/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index ac0820ed..76fd61a5 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -214,7 +214,7 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
    - <span class="parameter">run </span> `Boolean`, Default: True<br>
    - <span class="parameter">layer </span> `String`, Default: clr<br>
        Which layer stores counts for differential expression test.
-   - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
+   - <span class="parameter">mincels </span> `Integer`, Default: 10<br>
        If the number of clusters contains less than the number of cells maker analysis is not necessary.
    - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
    - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>

From 1cd393cdd6bce10eab1ef12715feac00aa98c048 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <giulia.garcia@dtc.ox.ac.uk>
Date: Tue, 5 Mar 2024 11:42:38 +0000
Subject: [PATCH 18/22] changes to clustering yaml

---
 docs/yaml_docs/pipeline_clustering_yml.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 76fd61a5..b34981fb 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -228,7 +228,7 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
     - <span class="parameter">layer </span> `String`, Default: logged_counts<br>
       Which layer stores counts for differential expression test. 
        Options include logged_counts, signac_norm , and logTF_norm,logIDF_norm
-    - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
+    - <span class="parameter">mincels </span> `Integer`, Default: 10<br>
        If the number of clusters contains less than the number of cells maker analysis is not necessary.
     - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
         Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
@@ -240,7 +240,7 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
 
 
  - <span class="parameter">multimodal:</span><br>
-   - <span class="parameter">mincels </span> `Integer`, Default: t-10<br>
+   - <span class="parameter">mincels </span> `Integer`, Default:10<br>
        If the number of clusters contains less than the number of cells maker analysis is not necessary.
     - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
         Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’

From 3336992c5782c1fbd60a5e62cc9f807693b5aec9 Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <giulia.garcia@dtc.ox.ac.uk>
Date: Tue, 12 Mar 2024 14:09:22 +0000
Subject: [PATCH 19/22] clustering_yml changes

---
 docs/yaml_docs/pipeline_clustering_yml.md | 26 +++++++++++------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index b34981fb..91d4d9e9 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -12,7 +12,7 @@
 
 In this documentation, the parameters of the `clustering` configuration yaml file are explained.
 This file is generated running `panpipes clustering config`. <br>
-The individual steps run by the pipeline are described in [clustering worlfow](https://panpipes-pipelines.readthedocs.io/en/latest/workflows/clustering.html)
+The individual steps run by the pipeline are described in [clustering workflow](https://panpipes-pipelines.readthedocs.io/en/latest/workflows/clustering.html)
 
 When running the clustering workflow, panpipes provides a basic `pipeline.yml` file.
 To run the workflow on your own data, you need to specify the parameters described below in the `pipeline.yml` file to meet the requirements of your data.
@@ -55,11 +55,11 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
 
 
 - <span class="parameter">scaled_obj</span> `String`, Mandatory parameter, Default: mdata_scaled.h5mu<br>
- Path to the output file from preprocessing (e.g. `../preprocess/mdata_scaled.h5mu`).
+ Path to the output file from preprocessing (e.g. `../preprocessed/mdata_scaled.h5mu`).
  Ensure that the path to the file is correct.  
 
 - <span class="parameter">full_obj</span> `String`, Default: <br>
-  Speficy the full object if your scaled_obj contains only HVG.  If your scaled_obj contains all the genes then leave full_obj blank. 
+  Specify the full object if your scaled_obj contains only HVG.  If your scaled_obj contains all the genes then leave full_obj blank. 
   panpipes will use the full object to do marker genes analysis (rank_gene_groups) and for plotting those genes. 
 - <span class="parameter">modalities</span><br>
   - <span class="parameter">rna</span> `Boolean`, Default: True<br>
@@ -68,14 +68,14 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
   - <span class="parameter">spatial</span> `Boolean`, Default: False<br>
   Run clustering on each individual modality.
 
-- <span class="parameter">moltimodal</span><br>
+- <span class="parameter">multimodal</span><br>
   - <span class="parameter">rna_clustering</span> `Boolean`, Default: True<br>
   - <span class="parameter">integration_method</span> `String`, Default: WNN<br>
-  Options here include WNN, moda, and totalVI, and it tells us where to look for.
+  Options here include WNN, mofa, and totalVI, and it tells us where to look for.
 
 ## Parameters for finding neighbours 
 
-- <span class="parameter">neighors:</span> 
+- <span class="parameter">neighbors:</span> 
  Sets the number of neighbors to use when calculating the graph for clustering and umap.
   - <span class="parameter">rna:</span> 
 
@@ -204,18 +204,18 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
        - <span class="parameter">method </span> `String`, Default: t-test_overestim_var<br>
        Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
        - <span class="parameter">mincels </span> `Integer`, Default: 10<br>
-       If the number of clusters contains less than the number of cells maker analysis is not necessary.
+       Minimal number of cells in a cluster. If the cluster contains less than this number of cells, the marker analysis won't be run.
        - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
        - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
        This parameter is mandatory if pseudo_seurat is set to True 
        - <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
-       TThis parameter is mandatory if pseudo_seurat is set to True 
+       This parameter is mandatory if pseudo_seurat is set to True 
  - <span class="parameter">prot:</span><br>
    - <span class="parameter">run </span> `Boolean`, Default: True<br>
    - <span class="parameter">layer </span> `String`, Default: clr<br>
        Which layer stores counts for differential expression test.
    - <span class="parameter">mincels </span> `Integer`, Default: 10<br>
-       If the number of clusters contains less than the number of cells maker analysis is not necessary.
+       Minimal number of cells in a cluster. If the cluster contains less than this number of cells, the marker analysis won't be run.
    - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
    - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
    - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
@@ -229,7 +229,7 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
       Which layer stores counts for differential expression test. 
        Options include logged_counts, signac_norm , and logTF_norm,logIDF_norm
     - <span class="parameter">mincels </span> `Integer`, Default: 10<br>
-       If the number of clusters contains less than the number of cells maker analysis is not necessary.
+       Minimal number of cells in a cluster. If the cluster contains less than this number of cells, the marker analysis won't be run.
     - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
         Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
     - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
@@ -241,7 +241,7 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
 
  - <span class="parameter">multimodal:</span><br>
    - <span class="parameter">mincels </span> `Integer`, Default:10<br>
-       If the number of clusters contains less than the number of cells maker analysis is not necessary.
+       If the cluster contains less than this number of cells, the marker analysis won't be run.
     - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
         Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
     - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
@@ -258,14 +258,14 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
    - <span class="parameter">method </span> `String`, Default: t-test_overestim_var<br>
         Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
    - <span class="parameter">mincels </span> `Integer`, Default: 10<br>
-       If the number of clusters contains less than the number of cells maker analysis is not necessary.
+       Minimal number of cells in a cluster. If the cluster contains less than this number of cells, the marker analysis won't be run.
    - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
    - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
       This parameter is mandatory if pseudo_seurat is set to True 
    - <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
        This parameter is mandatory if pseudo_seurat is set to True 
 ## Plot specifications
-Used to define which metadata columns are used in the visualisations 
+Used to define which metadata columns are used in the visualizations 
  - <span class="parameter">plotspecs:</span><br>
    - <span class="parameter">layers: </span><br>
      - <span class="parameter">rna </span> `String`, Default: logged_counts<br>

From 2fe7016e20a0c8954e0b355155350b80e69433ba Mon Sep 17 00:00:00 2001
From: giuliaelgarcia <giulia.garcia@dtc.ox.ac.uk>
Date: Mon, 18 Mar 2024 15:06:19 +0000
Subject: [PATCH 20/22] changes to clustering yaml

---
 docs/yaml_docs/pipeline_clustering_yml.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 91d4d9e9..55977e4c 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -271,5 +271,6 @@ Used to define which metadata columns are used in the visualizations
      - <span class="parameter">rna </span> `String`, Default: logged_counts<br>
      - <span class="parameter">prot </span> `String`, Default: clr<br>
      - <span class="parameter">atac </span> `String`, Default: signac_norm<br>
-     - <span class="parameter">spatial </span> `?`, Default: ?<br> 
+     - <span class="parameter">spatial </span> `String`, Default: None<br> 
+     Options include lognorm and norm_pearson_resid depending what was selected on preprocessing. 
   - <span class="parameter">top_n_markers </span> `Integer`, Default: 10<br>

From 7e1ddf33de1192c402636cb764e7a9d4151860a2 Mon Sep 17 00:00:00 2001
From: Giulia Garcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Tue, 16 Apr 2024 15:29:56 +0100
Subject: [PATCH 21/22] Update pipeline_clustering_yml.md

---
 docs/yaml_docs/pipeline_clustering_yml.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md
index 55977e4c..bc5a22dd 100644
--- a/docs/yaml_docs/pipeline_clustering_yml.md
+++ b/docs/yaml_docs/pipeline_clustering_yml.md
@@ -203,8 +203,8 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
          Which layer stores counts for differential expression test.
        - <span class="parameter">method </span> `String`, Default: t-test_overestim_var<br>
        Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
-       - <span class="parameter">mincels </span> `Integer`, Default: 10<br>
-       Minimal number of cells in a cluster. If the cluster contains less than this number of cells, the marker analysis won't be run.
+       - <span class="parameter">mincells </span> `Integer`, Default: 10<br>
+       Marker analysis is run for clusters >= mincells. If a cluster ncells < mincells , then the cluster is excluded from marker analysis
        - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
        - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
        This parameter is mandatory if pseudo_seurat is set to True 
@@ -214,8 +214,8 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
    - <span class="parameter">run </span> `Boolean`, Default: True<br>
    - <span class="parameter">layer </span> `String`, Default: clr<br>
        Which layer stores counts for differential expression test.
-   - <span class="parameter">mincels </span> `Integer`, Default: 10<br>
-       Minimal number of cells in a cluster. If the cluster contains less than this number of cells, the marker analysis won't be run.
+   - <span class="parameter">mincells </span> `Integer`, Default: 10<br>
+       Marker analysis is run for clusters >= mincells. If a cluster ncells < mincells , then the cluster is excluded from marker analysis
    - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
    - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
    - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
@@ -228,8 +228,8 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
     - <span class="parameter">layer </span> `String`, Default: logged_counts<br>
       Which layer stores counts for differential expression test. 
        Options include logged_counts, signac_norm , and logTF_norm,logIDF_norm
-    - <span class="parameter">mincels </span> `Integer`, Default: 10<br>
-       Minimal number of cells in a cluster. If the cluster contains less than this number of cells, the marker analysis won't be run.
+    - <span class="parameter">mincells </span> `Integer`, Default: 10<br>
+       Marker analysis is run for clusters >= mincells. If a cluster ncells < mincells , then the cluster is excluded from marker analysis
     - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
         Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
     - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
@@ -240,7 +240,7 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
 
 
  - <span class="parameter">multimodal:</span><br>
-   - <span class="parameter">mincels </span> `Integer`, Default:10<br>
+   - <span class="parameter">mincells </span> `Integer`, Default:10<br>
        If the cluster contains less than this number of cells, the marker analysis won't be run.
     - <span class="parameter">method </span> `String`, Default: wilcoxon<br>
         Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
@@ -257,8 +257,8 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
        Options include logged_counts, signac_norm , and logTF_norm,logIDF_norm
    - <span class="parameter">method </span> `String`, Default: t-test_overestim_var<br>
         Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
-   - <span class="parameter">mincels </span> `Integer`, Default: 10<br>
-       Minimal number of cells in a cluster. If the cluster contains less than this number of cells, the marker analysis won't be run.
+   - <span class="parameter">mincells </span> `Integer`, Default: 10<br>
+       Marker analysis is run for clusters >= mincells. If a cluster ncells < mincells , then the cluster is excluded from marker analysis
    - <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
    - <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
       This parameter is mandatory if pseudo_seurat is set to True 

From 0935a7e307dc58eb1efaec4f37a349a7eb98abc5 Mon Sep 17 00:00:00 2001
From: Giulia Garcia <147185635+giuliaelgarcia@users.noreply.github.com>
Date: Tue, 16 Apr 2024 15:44:43 +0100
Subject: [PATCH 22/22] Update gene_list_format.md

---
 docs/usage/gene_list_format.md | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/docs/usage/gene_list_format.md b/docs/usage/gene_list_format.md
index e1df3016..790734bf 100644
--- a/docs/usage/gene_list_format.md
+++ b/docs/usage/gene_list_format.md
@@ -151,6 +151,33 @@ minimal:
 Generally in the visualization pipeline all gene groups in the input are plotted. In heatmaps and dotplots, one dotplot per group is plotted. For UMAPs, one plot per gene is
 plotted, and a new file is saved per group.
 
+
+## Plot Makers in the Visualization workflow 
+
+The custom maker csv file for full and minimal must contain three columns and follow the following structure: 
+  | mod  | feature  | group        |
+  |------|----------|--------------|
+  | prot | prot_CD8 | Tcellmarkers |
+  | rna  | CD8A     | Tcellmarkers |
+
+The full list will be plotted in dot plots and matrix plots, with one plot per group. 
+
+The shorter list will be plotted on umaps as well as other plot types, with one plot per group. 
+
+ | feature_1 | feature_2 | colour         |
+ |-----------|-----------|----------------|
+ | CD8A      | prot_CD8  |                |
+ | CD4       | CD8A      | doublet_scores |
+
+
+
+## Plot metadata variables 
+The scatter_features.csv file should have the following format:
+
+ | feature_1 | feature_2 | colour         |
+ |-----------|-----------|----------------|
+ |rna:total_counts | prot:total_counts  | doublet_scores
+
 ## Final notes
 
 Be deliberate and informative with the choice of group names for any gene set use, since the `.obs` column generated as output will be named based on the group of the gene list input file.