From 04ed1dbb46914553cb19337de0a13d0da2dbc297 Mon Sep 17 00:00:00 2001 From: Giulia Garcia <147185635+giuliaelgarcia@users.noreply.github.com> Date: Tue, 23 Apr 2024 10:42:24 +0100 Subject: [PATCH 1/6] Update index.rst --- docs/yaml_docs/index.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/yaml_docs/index.rst b/docs/yaml_docs/index.rst index 3fd86bc8..dd945cdf 100644 --- a/docs/yaml_docs/index.rst +++ b/docs/yaml_docs/index.rst @@ -10,4 +10,5 @@ Workflows configuration files pipeline_clustering_yml spatial_qc spatial_preprocess - spatial_deconvolution \ No newline at end of file + spatial_deconvolution + pipeline_clustering_yml.md From 9870c0fc72954d95e451a01e4a23ba2c7bc8be80 Mon Sep 17 00:00:00 2001 From: bio-la Date: Wed, 24 Apr 2024 16:52:26 +0200 Subject: [PATCH 2/6] fixed wrong params --- docs/yaml_docs/pipeline_clustering_yml.md | 27 ++++++++++++++--------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md index bc5a22dd..2f783d77 100644 --- a/docs/yaml_docs/pipeline_clustering_yml.md +++ b/docs/yaml_docs/pipeline_clustering_yml.md @@ -62,16 +62,21 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th Specify the full object if your scaled_obj contains only HVG. If your scaled_obj contains all the genes then leave full_obj blank. panpipes will use the full object to do marker genes analysis (rank_gene_groups) and for plotting those genes. - modalities
- - rna `Boolean`, Default: True
+ Which modalities to run clustering on. + - rna `Boolean`, Default: True
If set to `True`, the workflow will stop if it doesn't find a modality named 'rna' - prot `Boolean`, Default: True
+ If set to `True`, the workflow will stop if it doesn't find a modality named 'prot' - atac `Boolean`, Default: False
+ If set to `True`, the workflow will stop if it doesn't find a modality named 'atac' + - spatial `Boolean`, Default: False
- Run clustering on each individual modality. + If set to `True`, the workflow will stop if it doesn't find a modality named 'spatial' + - multimodal
- - rna_clustering `Boolean`, Default: True
- - integration_method `String`, Default: WNN
- Options here include WNN, mofa, and totalVI, and it tells us where to look for. + - rna_clustering `Boolean`, Default: False
If set to True, runs clustering on multimodal embedding + - integration_method `String`, Default: None
+ Specify the name of the multimodal embedding. Options here include WNN, mofa, totalvi, multivi. In case you have run WNN, the neigbhours calculation will be skipped since WNN provides its own. ## Parameters for finding neighbours @@ -79,7 +84,7 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th Sets the number of neighbors to use when calculating the graph for clustering and umap. - rna: - - use_existing `Boolean`, Default: True
+ - use_existing `Boolean`, Default: True
Use existing neighbours in .uns calculated in the `integration` workflow. If `False`, it will recalculate using the following parameters - dim_red `String`, Default: X_pca
Defines which representation in .obsm to use for nearest neighbors - n_dim_red `Integer`, Default: 30
@@ -94,7 +99,7 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th - prot: - - use_existing `Boolean`, Default: True
+ - use_existing `Boolean`, Default: True
Use existing neighbours in .uns calculated in the `integration` workflow. If `False`, it will recalculate using the following parameters - dim_red `String`, Default: X_pca
Defines which representation in .obsm to use for nearest neighbors - n_dim_red `Integer`, Default: 30
@@ -109,7 +114,7 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th - atac: - - use_existing `Boolean`, Default: True
+ - use_existing `Boolean`, Default: True
Use existing neighbours in .uns calculated in the `integration` workflow. If `False`, it will recalculate using the following parameters - dim_red `String`, Default: X_lsi
Defines which representation in .obsm to use for nearest neighbors - n_dim_red `Integer`, Default: 1
@@ -125,7 +130,7 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th - spatial: - - use_existing `Boolean`, Default: False
+ - use_existing `Boolean`, Default: False
Use existing neighbours in .uns calculated in the `integration` workflow. If `False`, it will recalculate using the following parameters - dim_red `String`, Default: X_pca
Defines which representation in .obsm to use for nearest neighbors - n_dim_red `Integer`, Default: 30
@@ -142,7 +147,7 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th - umap: - - run `Boolean`, Default: True
+ - run `Boolean`, Default: True
Set to `True` runs the umap calculation and plotting. - rna: - mindist `Float`, Default: 0.5
Can specify an array: 0.25,0.5 @@ -265,7 +270,7 @@ When pseudo_seurat is set to True then a [python implementation](https://github. - threshuse `Float`, Default: 0.25
This parameter is mandatory if pseudo_seurat is set to True ## Plot specifications -Used to define which metadata columns are used in the visualizations +Used to define layers are used in the markers visualizations - plotspecs:
- layers:
- rna `String`, Default: logged_counts
From 03b2c2892c40fc7c542f7ea2f17b4ba78a349db8 Mon Sep 17 00:00:00 2001 From: bio-la Date: Wed, 24 Apr 2024 16:53:27 +0200 Subject: [PATCH 3/6] typo --- docs/yaml_docs/pipeline_clustering_yml.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md index 2f783d77..cdd1ccd6 100644 --- a/docs/yaml_docs/pipeline_clustering_yml.md +++ b/docs/yaml_docs/pipeline_clustering_yml.md @@ -270,7 +270,7 @@ When pseudo_seurat is set to True then a [python implementation](https://github. - threshuse `Float`, Default: 0.25
This parameter is mandatory if pseudo_seurat is set to True ## Plot specifications -Used to define layers are used in the markers visualizations +Define which layers are used in the markers visualization - plotspecs:
- layers:
- rna `String`, Default: logged_counts
From 529613000641cf142987bd8d3d249f145dd425f8 Mon Sep 17 00:00:00 2001 From: bio-la Date: Wed, 24 Apr 2024 17:12:26 +0200 Subject: [PATCH 4/6] fixes --- panpipes/panpipes/pipeline_clustering/pipeline.yml | 12 ++++++++---- panpipes/python_scripts/run_umap.py | 2 +- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/panpipes/panpipes/pipeline_clustering/pipeline.yml b/panpipes/panpipes/pipeline_clustering/pipeline.yml index 7bf2db11..dc34e725 100644 --- a/panpipes/panpipes/pipeline_clustering/pipeline.yml +++ b/panpipes/panpipes/pipeline_clustering/pipeline.yml @@ -38,10 +38,10 @@ modalities: atac: False spatial: False -# if True, will look for WNN, or totalVI output +# if True, will look for WNN, mofa, multivi, totalVI embeddings multimodal: - run_clustering: True - #WNN, mofa, totalVI # this will tell us where to look for + run_clustering: False + #WNN, mofa, multivi, totalVI embeddings integration_method: @@ -50,9 +50,10 @@ multimodal: # --------------------------------------- # # ----------------------------- -# number of neighbors to use when calculating the graph for clustering and umap. +# number of neighbors to use when calculating the knn graph for clustering and umap. neighbors: rna: + #use the knn calculated in the integration workflow. If False it will recalculate use_existing: True # which representation in .obsm to use for nearest neighbors # if dim_red=X_pca and X_pca not in .obsm, will be computed with default parameters @@ -66,6 +67,7 @@ neighbors: # scanpy | hnsw (from scvelo) method: scanpy prot: + #use the knn calculated in the integration workflow. If False it will recalculate use_existing: True # which representation in .obsm to use for nearest neighbors # if dim_red=X_pca and X_pca not in .obsm, will be computed with default parameters @@ -79,6 +81,7 @@ neighbors: # scanpy | hnsw (from scvelo) method: scanpy atac: + #use the knn calculated in the integration workflow. If False it will recalculate use_existing: True # which representation in .obsm to use for nearest neighbors # if dim_red=X_lsi/X_pca and X_lsi/X_pca not in .obsm, will be computed with default parameters @@ -94,6 +97,7 @@ neighbors: # scanpy | hnsw (from scvelo) method: scanpy spatial: + #use the knn calculated in the integration workflow. If False it will recalculate use_existing: False # which representation in .obsm to use for nearest neighbors # if dim_red=X_pca and X_pca not in .obsm, will be computed with default parameters diff --git a/panpipes/python_scripts/run_umap.py b/panpipes/python_scripts/run_umap.py index 6a5b957b..e4fe42b0 100644 --- a/panpipes/python_scripts/run_umap.py +++ b/panpipes/python_scripts/run_umap.py @@ -33,7 +33,7 @@ default=0.1, help="no. neighbours parameters for sc.pp.neighbors()") parser.add_argument("--neighbors_key", - default="neighbors", help="algortihm choice from louvain and leiden") + default="neighbors", help="name of the saved knn neighbors") args, opt = parser.parse_known_args() L.info(args) From 9771db6a566550ea569523aca537f3d1b272c8c7 Mon Sep 17 00:00:00 2001 From: bio-la Date: Fri, 26 Apr 2024 11:25:00 +0200 Subject: [PATCH 5/6] small changes --- docs/yaml_docs/pipeline_clustering_yml.md | 8 ++++++-- panpipes/panpipes/pipeline_clustering.py | 3 ++- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md index cdd1ccd6..e190c55d 100644 --- a/docs/yaml_docs/pipeline_clustering_yml.md +++ b/docs/yaml_docs/pipeline_clustering_yml.md @@ -14,7 +14,10 @@ In this documentation, the parameters of the `clustering` configuration yaml fil This file is generated running `panpipes clustering config`.
The individual steps run by the pipeline are described in [clustering workflow](https://panpipes-pipelines.readthedocs.io/en/latest/workflows/clustering.html) -When running the clustering workflow, panpipes provides a basic `pipeline.yml` file. +The `clustering` workflow works with outputs generated by the `integration` workflow, and expects a `MuData` object with +`neighbors` saved in the `.uns` of the global layer to run clustering on the multimodal embedding. If `neighbors` are calculated on each modality layers, these can be reused or re-calculated on the flight. + +When running the clustering workflow, panpipes provides a basic `pipeline.yml` file to customize with parameters. To run the workflow on your own data, you need to specify the parameters described below in the `pipeline.yml` file to meet the requirements of your data. However, we do provide pre-filled versions of the `pipeline.yml` file for individual [tutorials](https://panpipes-pipelines.readthedocs.io/en/latest/tutorials/index.html). @@ -76,7 +79,8 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th - multimodal
- rna_clustering `Boolean`, Default: False
If set to True, runs clustering on multimodal embedding - integration_method `String`, Default: None
- Specify the name of the multimodal embedding. Options here include WNN, mofa, totalvi, multivi. In case you have run WNN, the neigbhours calculation will be skipped since WNN provides its own. + In case you have run WNN and want to run clustering on the wnn embedding, specify "WNN" here. The neigbhours are saved with a different `--neighbors_key` param only for wnn, for every other method (totalvi, multivi, mofa) leave this parameter blank. + ## Parameters for finding neighbours diff --git a/panpipes/panpipes/pipeline_clustering.py b/panpipes/panpipes/pipeline_clustering.py index 99837875..a3caad38 100644 --- a/panpipes/panpipes/pipeline_clustering.py +++ b/panpipes/panpipes/pipeline_clustering.py @@ -43,9 +43,10 @@ def set_up_dirs(log_file): ## Single modality scripts ## ------------------------------------ -# -----------------------------------= +# -------------------------------------- # neighbors # -------------------------------------- +# TO DO create task to re-run neighbours on multimodal outer representations (this script can only read in each mod layer) @follows(set_up_dirs) @originate(PARAMS['mudata_with_knn']) def run_neighbors(outfile): From 34e5dd924ceef57f8b6eee5a50b4c8eddc958bc4 Mon Sep 17 00:00:00 2001 From: bio-la Date: Fri, 26 Apr 2024 11:37:33 +0200 Subject: [PATCH 6/6] floats and arrays --- docs/yaml_docs/pipeline_clustering_yml.md | 33 ++++++++++++++--------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/docs/yaml_docs/pipeline_clustering_yml.md b/docs/yaml_docs/pipeline_clustering_yml.md index e190c55d..7f476833 100644 --- a/docs/yaml_docs/pipeline_clustering_yml.md +++ b/docs/yaml_docs/pipeline_clustering_yml.md @@ -154,48 +154,48 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th - run `Boolean`, Default: True
Set to `True` runs the umap calculation and plotting. - rna: - mindist `Float`, Default: 0.5
- Can specify an array: 0.25,0.5 + Can specify a single float or an array: 0.25,0.5 - prot: - mindist `Float`, Default: 0.5
- Can specify an array: 0.25,0.5,0.8 + Can specify a single float or an array: 0.25,0.5,0.8 - atac: - mindist `Float`, Default: 0.5
- Can specify an array: 0.25,0.5,0.8 + Can specify a single float or an array: 0.25,0.5,0.8 - multimodal: - mindist `Float`, Default: 0.5
- Can specify an array: 0.25,0.5,0.8 + Can specify a single float or an array: 0.25,0.5,0.8 - rna: - mindist `Float`, Default: 0.5
- Can specify an array: 0.25,0.5,0.8 + Can specify a single float or an array: 0.25,0.5,0.8 ## Parameters for clustering - clusterspecs: - rna: - resolutions `Float`, Default: 0.2, 0.6, 1
- Can specify an array: 0.2,0.6,1 + Can specify a single float or an array: 0.2,0.6,1 - algorithm `String`, Default: leiden
Options include louvain or leiden. - prot: - resolutions `Float`, Default: 0.2, 0.6, 1
- Can specify an array: 0.2,0.6,1 + Can specify a single float or an array: 0.2,0.6,1 - algorithm `String`, Default: leiden
Options include louvain or leiden. - atac: - resolutions `Float`, Default: 0.2, 0.6, 1
- Can specify an array to compute in parallel: 0.2,0.6,1 + Can specify a single float or an array to compute in parallel: 0.2,0.6,1 - algorithm `String`, Default: leiden
Options include louvain or leiden. - multimmodal: - resolutions `Float`, Default: 0.5, 0.7
- Can specify an array to compute in parallel: 0.2,0.6,1 + Can specify a single float or an array to compute in parallel: 0.2,0.6,1 - algorithm `String`, Default: leiden
Options include louvain or leiden. - spatial: - resolutions `Float`, Default: 0.2, 0.6, 1
- Can specify an array to compute in parallel: 0.2,0.6,1 + Can specify a single float or an array to compute in parallel: 0.2,0.6,1 - algorithm `String`, Default: leiden
Options include louvain or leiden. @@ -216,8 +216,10 @@ When pseudo_seurat is set to True then a [python implementation](https://github. Marker analysis is run for clusters >= mincells. If a cluster ncells < mincells , then the cluster is excluded from marker analysis - pseudo_seurat `Boolean`, Default: False
- minpct `Float`, Default: 0.1
+ Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations. This parameter is mandatory if pseudo_seurat is set to True - threshuse `Float`, Default: 0.25
+ Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells. This parameter is mandatory if pseudo_seurat is set to True - prot:
- run `Boolean`, Default: True
@@ -228,8 +230,10 @@ When pseudo_seurat is set to True then a [python implementation](https://github. - method `String`, Default: wilcoxon
- pseudo_seurat `Boolean`, Default: False
- minpct `Float`, Default: 0.1
+ Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations. This parameter is mandatory if pseudo_seurat is set to True - threshuse `Float`, Default: 0.25
+ Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells. This parameter is mandatory if pseudo_seurat is set to True - atac:
@@ -243,8 +247,10 @@ When pseudo_seurat is set to True then a [python implementation](https://github. Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’ - pseudo_seurat `Boolean`, Default: False
- minpct `Float`, Default: 0.1
+ Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations. This parameter is mandatory if pseudo_seurat is set to True - threshuse `Float`, Default: 0.25
+ Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells. This parameter is mandatory if pseudo_seurat is set to True @@ -255,9 +261,9 @@ When pseudo_seurat is set to True then a [python implementation](https://github. Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’ - pseudo_seurat `Boolean`, Default: False
- minpct `Float`, Default: 0.1
- This parameter is mandatory if pseudo_seurat is set to True + Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations. This parameter is mandatory if pseudo_seurat is set to True - threshuse `Float`, Default: 0.25
- This parameter is mandatory if pseudo_seurat is set to True + Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells.This parameter is mandatory if pseudo_seurat is set to True - spatial:
@@ -270,8 +276,9 @@ When pseudo_seurat is set to True then a [python implementation](https://github. Marker analysis is run for clusters >= mincells. If a cluster ncells < mincells , then the cluster is excluded from marker analysis - pseudo_seurat `Boolean`, Default: False
- minpct `Float`, Default: 0.1
- This parameter is mandatory if pseudo_seurat is set to True + Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations. This parameter is mandatory if pseudo_seurat is set to True - threshuse `Float`, Default: 0.25
+ Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells. This parameter is mandatory if pseudo_seurat is set to True ## Plot specifications Define which layers are used in the markers visualization