diff --git a/docs/usage/gene_list_format.md b/docs/usage/gene_list_format.md index 9c6a02ae..1c5b6aaf 100644 --- a/docs/usage/gene_list_format.md +++ b/docs/usage/gene_list_format.md @@ -43,7 +43,7 @@ For a typical usecase, we provide example lists on our [github page](https://git The human-only cellcycle genes used in [scanpy.score_genes_cell_cycle](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.score_genes_cell_cycle.html) are stored in [resources/cell_cycle_genes.csv](https://github.com/DendrouLab/panpipes/blob/main/panpipes/resources/cell_cycle_genes.tsv) -However, if the data is mouse only then the cellcycle gene list can be found in [resources/mouse_cell_cycle_genes.tsv](https://github.com/DendrouLab/panpipes/blob/mouse_cell_cycle/panpipes/resources/mouse_cell_cycle_genes.tsv) +However, if you are working with mouse data, we supply an alternative cellcycle gene list with murine genes, which can be found in [resources/mouse_cell_cycle_genes.tsv](https://github.com/DendrouLab/panpipes/blob/mouse_cell_cycle/panpipes/resources/mouse_cell_cycle_genes.tsv) Differently from the other custom gene file, the cell cycle file should be a **tab separated file with two columns**: @@ -105,6 +105,7 @@ However, if the input is from mouse data then, the custom genelist file can be s ```yaml exclude_file: resources/qc_gene_list_mouse.csv + ``` ### Explaining custom gene lists actions 1. **Ingest workflow** (pipeline_ingest.py) diff --git a/docs/yaml_docs/pipeline_ingestion_yml.md b/docs/yaml_docs/pipeline_ingestion_yml.md index 20dfb5de..5fe5d41e 100644 --- a/docs/yaml_docs/pipeline_ingestion_yml.md +++ b/docs/yaml_docs/pipeline_ingestion_yml.md @@ -201,21 +201,23 @@ In the ingestion workflow we compute cell and genes QC metrics (such as % of mit Feel free to leave options blank to run with default parameters. #### Providing a gene list -To calculate RNA QC metrics, we need to define a gene list providing additional information on the genes in the data. +To calculate RNA QC metrics based on custom genes annotations, we need to use a gene list providing additional information on the genes expressed in the data. Additionally, we can specify what actions we want to apply to the genes, such as what metrics to calculate. -custom_genes_file`String`, Default: resources/qc_genelist_1.0.csv
+Please visit our documentation section on [creating and using custom genes lists](../usage/gene_list_format.md) to perform quality control and visualization. +custom_genes_file`String`, Mandatory parameter, Default: resources/qc_genelist_1.0.csv
Path to the file containing the entire human gene list. Panpipes provides such a file with standard genes, and the path to this file is set as default. -However, if the input is from mouse data then the user must provide the mouse gene list as shown here: +##### Working with different species than human +*If working with a different species, the user must provide the appropriate gene list. For example, we offer a precompiled version of the qc gene list for mouse, the user can supply the list by specifying the path to the file as shown here:* - custom_genes_file`String`, Default: qc_gene_list_mouse.csv
+ `custom_genes_file: qc_gene_list_mouse.csv` -This mouse gene list can be found in the panpipes [resources](https://github.com/DendrouLab/panpipes/blob/mouse_gene_list_upload/panpipes/resources/qc_gene_list_mouse.csv) +*Find the mouse gene list in our [resources](https://github.com/DendrouLab/panpipes/blob/mouse_gene_list_upload/panpipes/resources/qc_gene_list_mouse.csv)* -Usually, it's convenient to rely on known gene lists, as this simplifies various downstream tasks, such as evaluating the percentage of mitochondrial genes in the data, identify ribosomal genes, or excluding IGG genes from HVG selection. -For the ingestion workflow, we retrieved the cell cycle genes used in `scanpy.score_genes_cell_cycle` [Satija et al. (2015), Nature Biotechnology](https://www.nature.com/articles/nbt.3192) and stored them in a file: panpipes/resources/cell_cicle_genes.tsv. -Additionally, we also provide an example for an entire gene list: panpipes/resources/qc_genelist_1.0.csv + +It's convenient to rely on known gene lists, as this simplifies various downstream tasks, such as evaluating the percentage of mitochondrial genes in the data, identify ribosomal genes, or excluding IGG genes from HVG selection. +For the ingestion workflow, we retrieved the cell cycle genes used in `scanpy.score_genes_cell_cycle` [Satija et al. (2015), Nature Biotechnology](https://www.nature.com/articles/nbt.3192) | mod | feature | group | |-----|---------|--------| @@ -228,7 +230,7 @@ Additionally, we also provide an example for an entire gene list: panpipes/resou Next, we define "actions" on the genes as follows: In the group column, specify what actions you want to apply to that specific gene. -For instance: calc_proportion: mt will calculate proportion of reads mapping to the genes whose group is "mt". +For instance: `calc_proportion: mt` will calculate proportion of reads mapping to the genes whose group is "mt" in the custom genes file. (for pipeline_ingest.py) calc_proportions: calculate proportion of reads mapping to X genes over total number of reads, per cell