Skip to content

Commit

Permalink
small fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
bio-la committed Mar 13, 2024
1 parent 31eb8c6 commit 4261378
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 10 deletions.
3 changes: 2 additions & 1 deletion docs/usage/gene_list_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ For a typical usecase, we provide example lists on our [github page](https://git
The human-only cellcycle genes used in [scanpy.score_genes_cell_cycle](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.score_genes_cell_cycle.html)
are stored in [resources/cell_cycle_genes.csv](https://github.com/DendrouLab/panpipes/blob/main/panpipes/resources/cell_cycle_genes.tsv)

However, if the data is mouse only then the cellcycle gene list can be found in [resources/mouse_cell_cycle_genes.tsv](https://github.com/DendrouLab/panpipes/blob/mouse_cell_cycle/panpipes/resources/mouse_cell_cycle_genes.tsv)
However, if you are working with mouse data, we supply an alternative cellcycle gene list with murine genes, which can be found in [resources/mouse_cell_cycle_genes.tsv](https://github.com/DendrouLab/panpipes/blob/mouse_cell_cycle/panpipes/resources/mouse_cell_cycle_genes.tsv)

Differently from the other custom gene file, the cell cycle file should be a **tab separated file with two columns**:

Expand Down Expand Up @@ -105,6 +105,7 @@ However, if the input is from mouse data then, the custom genelist file can be s
```yaml
exclude_file: resources/qc_gene_list_mouse.csv
```
### Explaining custom gene lists actions
1. **Ingest workflow** (pipeline_ingest.py)
Expand Down
20 changes: 11 additions & 9 deletions docs/yaml_docs/pipeline_ingestion_yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,21 +201,23 @@ In the ingestion workflow we compute cell and genes QC metrics (such as % of mit
Feel free to leave options blank to run with default parameters.

#### Providing a gene list
To calculate RNA QC metrics, we need to define a gene list providing additional information on the genes in the data.
To calculate RNA QC metrics based on custom genes annotations, we need to use a gene list providing additional information on the genes expressed in the data.
Additionally, we can specify what actions we want to apply to the genes, such as what metrics to calculate.

<span class="parameter">custom_genes_file</span>`String`, Default: resources/qc_genelist_1.0.csv<br>
Please visit our documentation section on [creating and using custom genes lists](../usage/gene_list_format.md) to perform quality control and visualization.
<span class="parameter">custom_genes_file</span>`String`, Mandatory parameter, Default: resources/qc_genelist_1.0.csv<br>
Path to the file containing the entire human gene list. Panpipes provides such a file with standard genes, and the path to this file is set as default.

However, if the input is from mouse data then the user must provide the mouse gene list as shown here:
##### Working with different species than human
*If working with a different species, the user must provide the appropriate gene list. For example, we offer a precompiled version of the qc gene list for mouse, the user can supply the list by specifying the path to the file as shown here:*

<span class="parameter">custom_genes_file</span>`String`, Default: qc_gene_list_mouse.csv<br>
`custom_genes_file: qc_gene_list_mouse.csv`

This mouse gene list can be found in the panpipes [resources](https://github.com/DendrouLab/panpipes/blob/mouse_gene_list_upload/panpipes/resources/qc_gene_list_mouse.csv)
*Find the mouse gene list in our [resources](https://github.com/DendrouLab/panpipes/blob/mouse_gene_list_upload/panpipes/resources/qc_gene_list_mouse.csv)*

Usually, it's convenient to rely on known gene lists, as this simplifies various downstream tasks, such as evaluating the percentage of mitochondrial genes in the data, identify ribosomal genes, or excluding IGG genes from HVG selection.
For the ingestion workflow, we retrieved the cell cycle genes used in `scanpy.score_genes_cell_cycle` [Satija et al. (2015), Nature Biotechnology](https://www.nature.com/articles/nbt.3192) and stored them in a file: panpipes/resources/cell_cicle_genes.tsv.
Additionally, we also provide an example for an entire gene list: panpipes/resources/qc_genelist_1.0.csv

It's convenient to rely on known gene lists, as this simplifies various downstream tasks, such as evaluating the percentage of mitochondrial genes in the data, identify ribosomal genes, or excluding IGG genes from HVG selection.
For the ingestion workflow, we retrieved the cell cycle genes used in `scanpy.score_genes_cell_cycle` [Satija et al. (2015), Nature Biotechnology](https://www.nature.com/articles/nbt.3192)

| mod | feature | group |
|-----|---------|--------|
Expand All @@ -228,7 +230,7 @@ Additionally, we also provide an example for an entire gene list: panpipes/resou
Next, we define "actions" on the genes as follows:

In the group column, specify what actions you want to apply to that specific gene.
For instance: calc_proportion: mt will calculate proportion of reads mapping to the genes whose group is "mt".
For instance: `calc_proportion: mt` will calculate proportion of reads mapping to the genes whose group is "mt" in the custom genes file.

(for pipeline_ingest.py)
calc_proportions: calculate proportion of reads mapping to X genes over total number of reads, per cell
Expand Down

0 comments on commit 4261378

Please sign in to comment.