Skip to content

Commit

Permalink
version 0.9.1 a
Browse files Browse the repository at this point in the history
  • Loading branch information
frankligy committed Oct 10, 2021
1 parent 55b8309 commit 53d5ced
Show file tree
Hide file tree
Showing 37 changed files with 1,382 additions and 180 deletions.
Binary file modified docs/_build/doctrees/api.doctree
Binary file not shown.
Binary file added docs/_build/doctrees/change_log.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/index.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/introduction.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/principle.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/tutorial.doctree
Binary file not shown.
7 changes: 6 additions & 1 deletion docs/_build/html/_sources/api.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ API
ScTriangulate Class Methods
-----------------------------

.. _reference_to_instantiation:

__init__()
~~~~~~~~~~~~~~~~
.. autoclass:: sctriangulate.main_class.ScTriangulate
:members:
:exclude-members: confusion_to_df, plot_heterogeneity, gene_to_df, get_metrics_and_shapley,
Expand All @@ -14,7 +18,6 @@ ScTriangulate Class Methods
modality_contributions, plot_multi_modal_feature_rank, plot_long_heatmap, viewer_cluster_feature_figure,
viewer_cluster_feature_html, viewer_heterogeneity_figure, viewer_heterogeneity_html, plot_concordance


(static) salvage_run()
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: sctriangulate.main_class.ScTriangulate.salvage_run
Expand Down Expand Up @@ -185,6 +188,8 @@ add_azimuth()
~~~~~~~~~~~~~~~~
.. autofunction:: sctriangulate.preprocessing.add_azimuth

.. _reference_to_add_annotation:

add_annotations()
~~~~~~~~~~~~~~~~~~
.. autofunction:: sctriangulate.preprocessing.add_annotations
Expand Down
15 changes: 15 additions & 0 deletions docs/_build/html/_sources/change_log.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Change Log
============

Version 0.9.1 2021/10/10
-------------------------

1. Add the option for whether assessing the raw cluster or not.
2. Polish the documentation and fix some typos



Version 0.9.0 2021/10/05
--------------------------

1. First public version.
1 change: 1 addition & 0 deletions docs/_build/html/_sources/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ Contents
tutorial
principle
api
change_log
contact


Expand Down
13 changes: 9 additions & 4 deletions docs/_build/html/_sources/introduction.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,22 @@ biologically meaningful metrics to assess cluster goodness, and `Shapley Value <
to attain a single stable solution.

.. note::
For larger dataset, it is advisable to run on a Linux system with enough RAM and space. The current release has been tested on both Mac and
Linux Cluster.
A typical scRNA-Seq dataset (10k cells) with four provided annotation-sets can run in ~10 minutes in a laptop. For larger datasets (100k cells) or multiome
(GEX + ATAC) with > 100k features (gene + peak), it is recommended to run the program in the high-performance compute environment.

Inputs and Outputs
---------------------
scTriangulate is designed for h5ad file, it works seemlessly with popular scanpy packages if you are familiar with it. In addtion to that, we offer
a myriad of preprocessing convenient functions to ease the file conversion process, currently we accept following format:

* **Anndata** (.h5 & .h5ad), the annotations are the columns in adata.obs
* **mtx**, annotations information should be supplied as addtional txt file (barcode -> label)
* **dense matrix**, txt expression matrix, annotations should be aupplied as addtional txt file.
* **mtx**, annotations information should be supplied as addtional txt file (see below example and :ref:`reference_to_add_annotation`)
* **dense matrix**, txt expression matrix, annotations should be aupplied as addtional txt file (see below example and :ref:`reference_to_add_annotation`).

.. csv-table:: annotation txt file
:file: ./_static/annotation_txt.csv
:widths: 10,10
:header-rows: 1

Optionally, users can supply their own umap embeddings, Please refer to :ref:`reference_to_add_umap` function for the details.

Expand Down
72 changes: 66 additions & 6 deletions docs/_build/html/_sources/principle.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,14 @@ no longer be considered in the marker genes and downstream assessment::
Visualization
----------------

scTriangulate offers a powerful toolkit allowing end users to visualize the hidden heterogeneity in many different ways, also the ``color`` Module
provide necessary function to assist in making publication quality figures. Here we highlight some of the plotting function and we would like to refer
the users to the ``API`` part for more details.

plot_heterogeneity
~~~~~~~~~~~~~~~~~~~~~

This is the main feature of scTriangulate visualization functionality, built on top of scanpy. since scTriangualte mix-and-match cluster boundaries from
This is the main feature of scTriangulate visualizations, built on top of scanpy. Since scTriangualte can mix-and-match cluster boundaries from
diverse annotations, it empowers the users to discover further and hidden heterogeneity. Now, question is how the user can visualize the heterogeneity?

.. image:: ./_static/plot_heterogeneity_chop.png
Expand All @@ -113,11 +117,11 @@ diverse annotations, it empowers the users to discover further and hidden hetero
:align: center
:target: target

Now as you can see, **annoatation@c1** has been suggested to be divided by two sub populations, now we want to know:
The philosophy behind this function is to first pick a viewpoint from which we want to look at the final result. For instance, here we choose "annotation1" as
the viewpoint. As you can see, **annoatation@c1** has been suggested to be divided by two sub populations, now we want to know:

1. how these two sub populations are lait out on umap?
2. what are the differentially expressed features between these two sub populations?
3. How many cells are in each sub populations?

Let's show some of the functionalities:

Expand Down Expand Up @@ -151,11 +155,67 @@ Let's show some of the functionalities:
:align: center
:target: target

plot_concordance
~~~~~~~~~~~~~~~~~~

When we have more than 2 annotation-sets, we want to know how they correspond to each other, what fraction of cells in annotation1 flow into
another annotation and vice versus::

sctri.plot_concordance(key1='azimuth',key2='pruned',style='3dbar')

.. image:: ./_static/3dbar.png
:height: 400px
:width: 500px
:align: center
:target: target

plot_clusterability
~~~~~~~~~~~~~~~~~~~~~~

Do you want to know for a specific annotation-set, which cluster is most likely to be subdivided and which is the least? We refer to this as
clusterability::

sctri.plot_clusterability(key='sctri_rna_leiden_1',col='raw',fontsize=8)

.. image:: ./_static/plot_clusterability.png
:height: 400px
:width: 500px
:align: center
:target: target

plot_long_heatmap
~~~~~~~~~~~~~~~~~~~~~~

A heatmap that can be arbitrarily long and ALWAYS display every gene::

sctri.plot_long_umap(n_features=20,figsize=(20,20))

.. image:: ./_static/long_heatmap.png
:height: 400px
:width: 500px
:align: center
:target: target

plot_multi_modal_feature_rank
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In multi-modal setting, a cluster's identify usually defined by all modalities, do you want to know by which modality a cluster is mainly defined?::

sctri.plot_multi_modal_feature_rank(cluster='sctri_rna_leiden_2@10')

.. image:: ./_static/plot_multi_modal_feature_rank.png
:height: 500px
:width: 500px
:align: center
:target: target





Other plotting funcctions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**1.plot_confusion**
plot_confusion
~~~~~~~~~~~~~~~~

It allows you to visualize the stability of each clustes in one annotation::

Expand Down
58 changes: 29 additions & 29 deletions docs/_build/html/_sources/tutorial.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ cells before filtering.

Here we first conduct basic single cell analysis to obtain Leiden clustering results, however, at various resolutions (r=1,2,3). Smaller resolutions lead to
broader clusters, and larger resolution value will result in more granular clustering. We leverage scTriangulate to take the three resolutions as the query
annotations, and automatically mix-and-match cluster boundary from different resolutions, which at the end, yield scTriangulate reconciled cluster solutions.
annotation-sets, and automatically mix-and-match cluster boundary from different resolutions, which at the end, yield scTriangulate reconciled cluster solutions.

Download and preprocessing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -26,7 +26,7 @@ First load the packages::
from sctriangulate import *
from sctriangulate.preprocessing import *

The h5 file can be downloaded from `here <http://altanalyze.org/scTriangulate/scRNASeq/pbmc_10k_v3.h5>`_. We used scanpy and scTriangulate
The h5 file can be downloaded from `here <http://altanalyze.org/scTriangulate/scRNASeq/pbmc_10k_v3.h5>`_. First use scanpy and scTriangulate
preprocessing module to conduct basic QC filtering and single cell pipeline::

adata = sc.read_10x_h5('./pbmc_10k_v3_filtered_feature_bc_matrix.h5')
Expand Down Expand Up @@ -69,22 +69,22 @@ Visualize the important QC metrics and make the decision on the proper cutoffs::
:align: right
:target: target

We filtered out the cells whose min_genes = 300, min_counts = 500, mt > 20%, 11,022 cells left::
Then filter out the cells whose min_genes = 300, min_counts = 500, mt > 20%, 11,022 cells left::

sc.pp.filter_cells(adata, min_genes=300)
sc.pp.filter_cells(adata, min_counts=500)
adata = adata[adata.obs.pct_counts_mt < 20, :]
print(adata) # 11022 × 33538


Then we will use scTriangulate wrapper functions to obtain the Leiden clutser results at different resolutions (r=1,2,3), specifically,
Then use scTriangulate wrapper functions to obtain the Leiden clutser results at different resolutions (r=1,2,3), specifically,
we chose number of PCs as 50, and 3000 highly variable genes::

adata = scanpy_recipe(adata,is_log=False,resolutions=[1,2,3],pca_n_comps=50,n_top_genes=3000)

After running this command, we will have three columns in ``adata.obs``, namely, ``sctri_rna_leiden_1``, ``sctri_rna_leiden_2``, ``sctri_rna_leiden_3``.
After running this command, you will have three columns in ``adata.obs``, namely, ``sctri_rna_leiden_1``, ``sctri_rna_leiden_2``, ``sctri_rna_leiden_3``.
Also a h5ad file named ``adata_after_scanpy_recipe_rna_1_2_3_umap_True.h5ad`` will be automatically saved to current directory so there's no need to re-run this
step again, Now let's visualize them::
pre-processing step again, Now let's visualize them::

umap_dual_view_save(adata,cols=['sctri_rna_leiden_1','sctri_rna_leiden_2','sctri_rna_leiden_3'])
# three umaps will be saved to your current directory.
Expand All @@ -95,9 +95,9 @@ step again, Now let's visualize them::
:align: center
:target: target

As we can see, different resolutions lead to various number of clusters, and it is clear that certain regions got sub-divided in higher resolutions. However,
we don't know whether this sub-populations are valid off the top of our heads. **Here comes scTriangulate, which will scan each clusters at each resolutions,
and mix-and-match different solutions to achieve an optimal one.**
As you can see, different resolutions lead to various number of clusters, and it is clear that certain regions get sub-divided in higher resolutions. However,
we don't know whether this sub-populations are valid off the top of our heads. Here comes scTriangulate, which will scan each clusters at each resolution,
and mix-and-match different solutions to achieve a reconciled result.

Running scTriangulate
~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -110,29 +110,27 @@ handle every thing for us::

adata = sc.read('adata_after_scanpy_recipe_rna_1_2_3_umap_True.h5ad')
sctri = ScTriangulate(dir='./output',adata=adata,query=['sctri_rna_leiden_1','sctri_rna_leiden_2','sctri_rna_leiden_3'])
sctri.lazy_run() # done!!!
sctri.lazy_run(assess_pruned=False,viewer_cluster=False,viewer_heterogeneity=False) # done!!!

We first instantiate ``ScTriangulate`` object by specify:

1. ``dir``, where all the intermediate and final results/plots will go into?
2. ``adata``, the adata that we want to start with.
3. ``query``, a list contains all the annotations that we want to triangulate.

The ``dir`` doesn't need to be an existing folder, the program will automatically create one if not present.
The ``dir`` doesn't need to be an existing folder, the program will automatically create one if not present. More information about instantiation can be
found in the API :ref:`reference_to_instantiation`.

.. note::

To save time, please run lazy_run(scale_sccaf=False,viewer_cluster=False), the first argument instruct the program to compute SCCAF score without
firstly scaling the data, which will save quite a lot time. By default this option is set to True. The second argument is to instruct the program to
not build the cluster_viewer, it will take some time to generate all the images that the cluster viewer needs.

The purpose of three arguments in ``lazy_run()`` is just to save time, you can leave it as default by calling ``lazy_run()``, which will automatically
assess the stability of the final defined cluster, generate the cluster viewer and heterogeneity viewer. However, if you only want to obtain the scTriangulate
reconciled cluster information, you don't need the above three steps, so we turn them off.

However for the purpose of instructing users how to understand this tool, we are going to run it step by step.

.. note::

Users can switch to manually run scTriangulat step by step, in order for granular operations/modifications. The instructions are as below.
The above ``lazy_run()`` function basically takes care step 1-4 automatically with default parameter settings.
However for the purpose of instructing users how to understand this tool, we are going to run it step by step to let the readers get a sense
of how the program work. We refer to it as Manual Run.

Manual Run
<<<<<<<<<<<<<
Expand Down Expand Up @@ -161,7 +159,7 @@ Step2: compute_shapley
++++++++++++++++++++++++

The second step is to utilize the calculated metrics, and assess which annotation/cluster is the best for **each single cell**. So the program iterate each row,
which is a single cell, retrive all the metrics associated with each cluster, and calculate shapley value of each cluster (in this case, each single cell has
representing a single cell, retrive all the metrics associated with each cluster, and calculate shapley value for each cluster (in this case, each single cell has
three conflicting clusters). Then the program will assign the cell to the "best" clusters amongst all solutions. We refer the resultant cluster assignment as
``raw`` cluster result::

Expand Down Expand Up @@ -190,7 +188,7 @@ unstable invalid clusters will be reassigned to its nearest neightbor's cluster
sctri.prune_result()
sctri.serialize('break_point_after_prune.p')

A column named "pruned" will be added, also "confidence" column stores the confidence the program hold to call it out.
A column named "pruned" will be added, also "confidence" column stores the confidence the program hold to call this cluster out.

.. csv-table:: After prune result
:file: ./_static/tutorial/single_modality/head_check_after_prune.csv
Expand All @@ -201,10 +199,10 @@ A column named "pruned" will be added, also "confidence" column stores the confi
Step4: building the viewer
++++++++++++++++++++++++++++++

We provide an automatically generated webpage, called scTriangulate viewer, to allow users to dynamically navigate the robustness of each cluster from each
We provide an automatically generated html page, called scTriangulate viewer, to allow users to dynamically toggle different clusters the robustness of each cluster from each
annotations (cluster viewer). Also, it enables the inspection of further heterogeneity that might not have been captured by a
single annotation (hetergeneity viewer). The logics of following codes are simple, we first build html, then we generate the figures that the html page would
need to render it::
need for proper rendering::

sctri = ScTriangulate.deserialize('output/break_point_after_prune.p')
sctri.viewer_cluster_feature_html()
Expand Down Expand Up @@ -272,7 +270,7 @@ Discover hidden heterogeneity
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

scTrangulate, by design, could greedily discover any hidden heterogeneity via levaraging the cluster boundaries from each annotation. Here the scTriangulate
suggests sub-dividing of CD14 Mono population which has been annotated in Azimuth reference::
suggests sub-dividing of CD14 Mono population which has not been annotated in Azimuth reference::

# if we run lazy_run
sctri = ScTriangulate.deserialize('output/after_pruned_assess.p)
Expand All @@ -288,7 +286,8 @@ suggests sub-dividing of CD14 Mono population which has been annotated in Azimut
:align: center
:target: target

Then by pulling out the marker genes the program detected, we reason that it was caused by at least three distinctive sub-groups:
Then by pulling out the marker genes the program detected, we reason that the heterogeneity reflect at least three sub cell states, supported by
`literatures <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6077267/>`_:

1. **classifical CD14+ Monocyte**: CLEC5A, CLEC4D, S100A9
2. **intermediate CD14+ Monocyte**: FCGR3A, CLEC10A, HLA-DRA
Expand All @@ -311,9 +310,9 @@ Multi-modal workflow
-----------------------------------

In this example run, we are going to use a CITE-Seq dataset from human total nucleated cells (TNCs). This dataset contains 31 ADTs and in toal 8,491 cells.
It is normal practice to analyze and cluster each modality's data seperately, and then try to merge them together. However, to reconcile the clustering
It is a common practice to analyze and cluster based on each modality seperately, and then try to merge them result together. However, to reconcile the clustering
differences are not a trivial tasks and it requires the simoutaneous consideration of both RNA gene expression and surface protein. Thankfully, scTriangulate
can help to make the decision.
can help us make the decision.

the dataset can be downloaded from the `website <http://altanalyze.org/scTriangulate/CITESeq/TNC_r1-RNA-ADT.h5>`_.

Expand Down Expand Up @@ -406,7 +405,7 @@ Running scTriangulate
Just use ``lazy_run()`` function, I have broken it down in the single_modality section::

sctri = ScTriangulate(dir='output',adata=adata_combine,add_metrics={},query=['sctri_adt_leiden_1','sctri_adt_leiden_2','sctri_adt_leiden_3','sctri_rna_leiden_1','sctri_rna_leiden_2','sctri_rna_leiden_3'])
sctri.lazy_run()
sctri.lazy_run(assess_pruned=False,viewer_cluster=False,viewer_heterogeneity=False)

All the intermediate results would be stored at ./output folder.

Expand Down Expand Up @@ -445,7 +444,8 @@ scTriangulate allows the triangulation amongst diverse resolutions and modalitie
:align: center
:target: target

scTriangulate discovers new cell state due to ADT markers, azimuth prediction can be downloaded `from here <http://altanalyze.org/scTriangulate/CITESeq/azimuth_pred.tsv>`_::
scTriangulate discovers new cell state due to ADT markers (CD56 high MAIT cell), supported by `previous literature <https://www.pnas.org/content/114/27/E5434>`_,
azimuth prediction can be downloaded `from here <http://altanalyze.org/scTriangulate/CITESeq/azimuth_pred.tsv>`_::

sctri = ScTriangulate.deserialize('output/after_pruned_assess.p')
add_azimuth(sctri.adata,'azimuth_pred.tsv')
Expand Down
9 changes: 9 additions & 0 deletions docs/_build/html/_static/annotation_txt.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
barcode,label
D150_GTGTTAGAGGTGCTAG,Mesothelial FB
D150_ACTACGATCTCAGGCG,Mesothelial FB
D150_GGAGATGTCACACCGG,Mesothelial FB
D150_CGGACACGTCGTGCCA,Mesothelial FB
D062_TTCTTCCCACGACTAT,Mesothelial FB
D150_TCGATTTAGGATATGT,Mesothelial FB
D150_GGTAATCGTAAGCTCT,Mesothelial FB
D150_GCCATGGAGGGTGAAA,Mesothelial FB
Loading

0 comments on commit 53d5ced

Please sign in to comment.