Skip to content

Commit

Permalink
Merge pull request #228 from DendrouLab/mouse_cell_cycle
Browse files Browse the repository at this point in the history
Issue 217 tasks completed (mouse cell cycle gene list)
  • Loading branch information
bio-la authored Mar 14, 2024
2 parents 0a23aba + 4261378 commit c61e5cc
Show file tree
Hide file tree
Showing 3 changed files with 135 additions and 11 deletions.
23 changes: 21 additions & 2 deletions docs/usage/gene_list_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,11 @@ For a typical usecase, we provide example lists on our [github page](https://git

### Cell cycle genes

The cellcycle genes used in [scanpy.score_genes_cell_cycle](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.score_genes_cell_cycle.html)
The human-only cellcycle genes used in [scanpy.score_genes_cell_cycle](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.score_genes_cell_cycle.html)
are stored in [resources/cell_cycle_genes.csv](https://github.com/DendrouLab/panpipes/blob/main/panpipes/resources/cell_cycle_genes.tsv)

However, if you are working with mouse data, we supply an alternative cellcycle gene list with murine genes, which can be found in [resources/mouse_cell_cycle_genes.tsv](https://github.com/DendrouLab/panpipes/blob/mouse_cell_cycle/panpipes/resources/mouse_cell_cycle_genes.tsv)

Differently from the other custom gene file, the cell cycle file should be a **tab separated file with two columns**:

- **gene_name**: the name of the gene
Expand All @@ -67,7 +69,7 @@ If left blank, these actions will not be performed (i.e. no calculation of % of

### Supplying custom gene lists to calculate QC metrics

The custom genelist file can be supplied by the user in two workflows to perform the three main actions:
The human custom genelist file can be supplied by the user in two workflows to perform the three main actions:

1. **Ingest workflow**

Expand All @@ -87,6 +89,23 @@ The custom genelist file can be supplied by the user in two workflows to perform
*Note that we have formatted an example file containing all genes to use in both workflows, and therefore supply the same file to both workflows but users can have independent files for each of them.*
However, if the input is from mouse data then, the custom genelist file can be supplied by the user in two workflows to perform the three main actions:
1. **Ingest workflow**
pipeline_ingest config file: (pipeline.yml)
```yaml
custom_genes_file: resources/qc_gene_list_mouse.csv
```
2. **Preprocess workflow**
pipeline_preprocess config file: (pipeline.yml)
```yaml
exclude_file: resources/qc_gene_list_mouse.csv
```
### Explaining custom gene lists actions
1. **Ingest workflow** (pipeline_ingest.py)
Expand Down
25 changes: 16 additions & 9 deletions docs/yaml_docs/pipeline_ingestion_yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,16 +201,23 @@ In the ingestion workflow we compute cell and genes QC metrics (such as % of mit
Feel free to leave options blank to run with default parameters.

#### Providing a gene list
To calculate RNA QC metrics, we need to define a gene list providing additional information on the genes in the data.
To calculate RNA QC metrics based on custom genes annotations, we need to use a gene list providing additional information on the genes expressed in the data.
Additionally, we can specify what actions we want to apply to the genes, such as what metrics to calculate.

<span class="parameter">custom_genes_file</span>`String`, Default: resources/qc_genelist_1.0.csv<br>
Path to the file containing the entire gene list. Panpipes provides such a file with standard genes, and the path to this file is set as default.

Please visit our documentation section on [creating and using custom genes lists](../usage/gene_list_format.md) to perform quality control and visualization.
<span class="parameter">custom_genes_file</span>`String`, Mandatory parameter, Default: resources/qc_genelist_1.0.csv<br>
Path to the file containing the entire human gene list. Panpipes provides such a file with standard genes, and the path to this file is set as default.

Usually, it's convenient to rely on known gene lists, as this simplifies various downstream tasks, such as evaluating the percentage of mitochondrial genes in the data, identify ribosomal genes, or excluding IGG genes from HVG selection.
For the ingestion workflow, we retrieved the cell cycle genes used in `scanpy.score_genes_cell_cycle` [Satija et al. (2015), Nature Biotechnology](https://www.nature.com/articles/nbt.3192) and stored them in a file: panpipes/resources/cell_cicle_genes.tsv.
Additionally, we also provide an example for an entire gene list: panpipes/resources/qc_genelist_1.0.csv
##### Working with different species than human
*If working with a different species, the user must provide the appropriate gene list. For example, we offer a precompiled version of the qc gene list for mouse, the user can supply the list by specifying the path to the file as shown here:*

`custom_genes_file: qc_gene_list_mouse.csv`

*Find the mouse gene list in our [resources](https://github.com/DendrouLab/panpipes/blob/mouse_gene_list_upload/panpipes/resources/qc_gene_list_mouse.csv)*


It's convenient to rely on known gene lists, as this simplifies various downstream tasks, such as evaluating the percentage of mitochondrial genes in the data, identify ribosomal genes, or excluding IGG genes from HVG selection.
For the ingestion workflow, we retrieved the cell cycle genes used in `scanpy.score_genes_cell_cycle` [Satija et al. (2015), Nature Biotechnology](https://www.nature.com/articles/nbt.3192)

| mod | feature | group |
|-----|---------|--------|
Expand All @@ -223,7 +230,7 @@ Additionally, we also provide an example for an entire gene list: panpipes/resou
Next, we define "actions" on the genes as follows:

In the group column, specify what actions you want to apply to that specific gene.
For instance: calc_proportion: mt will calculate proportion of reads mapping to the genes whose group is "mt".
For instance: `calc_proportion: mt` will calculate proportion of reads mapping to the genes whose group is "mt" in the custom genes file.

(for pipeline_ingest.py)
calc_proportions: calculate proportion of reads mapping to X genes over total number of reads, per cell
Expand Down Expand Up @@ -432,4 +439,4 @@ This can help to determine any inconsistencies in staining per channel and other
The maximum value will be set at the value of the 99.5% quantile, applied per feature.
Note that this feature is in the default muon `mu.pp.dsb` code, but manually implemented here.



98 changes: 98 additions & 0 deletions panpipes/resources/mouse_cell_cycle_genes.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
gene_name cc_phase
0 Mcm5 s
1 Pcna s
2 Tyms s
3 Fen1 s
4 Mcm2 s
5 Mcm4 s
6 Rrm1 s
7 Ung s
8 Gins2 s
9 Mcm6 s
10 Cdca7 s
11 Dtl s
12 Prim1 s
13 Uhrf1 s
14 Mlf1ip s
15 Hells s
16 Rfc2 s
17 Rpa2 s
18 Nasp s
19 Rad51ap1 s
20 Gmnn s
21 Wdr76 s
22 Slbp s
23 Ccne2 s
24 Ubr7 s
25 Pold3 s
26 Msh2 s
27 Atad2 s
28 Rad51 s
29 Rrm2 s
30 Cdc45 s
31 Cdc6 s
32 Exo1 s
33 Tipin s
34 Dscc1 s
35 Blm s
36 Casp8ap2 s
37 Usp1 s
38 Clspn s
39 Pola1 s
40 Chaf1b s
41 Brip1 s
42 E2f8 s
43 Hmgb2 g2m
44 Cdk1 g2m
45 Nusap1 g2m
46 Ube2c g2m
47 Birc5 g2m
48 Tpx2 g2m
49 Top2a g2m
50 Ndc80 g2m
51 Cks2 g2m
52 Nuf2 g2m
53 Cks1b g2m
54 Mki67 g2m
55 Tmpo g2m
56 Cenpf g2m
57 Tacc3 g2m
58 Fam64a g2m
59 Smc4 g2m
60 Ccnb2 g2m
61 Ckap2l g2m
62 Ckap2 g2m
63 Aurkb g2m
64 Bub1 g2m
65 Kif11 g2m
66 Anp32e g2m
67 Tubb4b g2m
68 Gtse1 g2m
69 Kif20b g2m
70 Hjurp g2m
71 Cdca3 g2m
72 Hn1 g2m
73 Cdc20 g2m
74 Ttk g2m
75 Cdc25c g2m
76 Kif2c g2m
77 Rangap1 g2m
78 Ncapd2 g2m
79 Dlgap5 g2m
80 Cdca2 g2m
81 Cdca8 g2m
82 Ect2 g2m
83 Kif23 g2m
84 Hmmr g2m
85 Aurka g2m
86 Psrc1 g2m
87 Anln g2m
88 Lbr g2m
89 Ckap5 g2m
90 Cenpe g2m
91 Ctcf g2m
92 Nek2 g2m
93 G2e3 g2m
94 Gas2l3 g2m
95 Cbx5 g2m
96 Cenpa g2m

0 comments on commit c61e5cc

Please sign in to comment.