version 0.9.1 a

frankligy · Oct 10, 2021 · 53d5ced · 53d5ced
1 parent 55b8309
commit 53d5ced
Show file tree

Hide file tree

Showing 37 changed files with 1,382 additions and 180 deletions.
diff --git a/docs/_build/doctrees/api.doctree b/docs/_build/doctrees/api.doctree
diff --git a/docs/_build/doctrees/change_log.doctree b/docs/_build/doctrees/change_log.doctree
diff --git a/docs/_build/doctrees/environment.pickle b/docs/_build/doctrees/environment.pickle
diff --git a/docs/_build/doctrees/index.doctree b/docs/_build/doctrees/index.doctree
diff --git a/docs/_build/doctrees/introduction.doctree b/docs/_build/doctrees/introduction.doctree
diff --git a/docs/_build/doctrees/principle.doctree b/docs/_build/doctrees/principle.doctree
diff --git a/docs/_build/doctrees/tutorial.doctree b/docs/_build/doctrees/tutorial.doctree
diff --git a/docs/_build/html/_sources/api.rst.txt b/docs/_build/html/_sources/api.rst.txt
@@ -4,6 +4,10 @@ API
 ScTriangulate Class Methods
 -----------------------------
 
+.. _reference_to_instantiation:
+
+__init__()
+~~~~~~~~~~~~~~~~
 .. autoclass:: sctriangulate.main_class.ScTriangulate
     :members: 
     :exclude-members: confusion_to_df, plot_heterogeneity, gene_to_df, get_metrics_and_shapley, 
@@ -14,7 +18,6 @@ ScTriangulate Class Methods
                       modality_contributions, plot_multi_modal_feature_rank, plot_long_heatmap, viewer_cluster_feature_figure,
                       viewer_cluster_feature_html, viewer_heterogeneity_figure, viewer_heterogeneity_html, plot_concordance
 
-
 (static) salvage_run()
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 .. autofunction:: sctriangulate.main_class.ScTriangulate.salvage_run
@@ -185,6 +188,8 @@ add_azimuth()
 ~~~~~~~~~~~~~~~~
 .. autofunction:: sctriangulate.preprocessing.add_azimuth
 
+.. _reference_to_add_annotation:
+
 add_annotations()
 ~~~~~~~~~~~~~~~~~~
 .. autofunction:: sctriangulate.preprocessing.add_annotations

diff --git a/docs/_build/html/_sources/change_log.rst.txt b/docs/_build/html/_sources/change_log.rst.txt
@@ -0,0 +1,15 @@
+Change Log
+============
+
+Version 0.9.1 2021/10/10
+-------------------------
+
+1. Add the option for whether assessing the raw cluster or not.
+2. Polish the documentation and fix some typos
+
+
+
+Version 0.9.0 2021/10/05
+--------------------------
+
+1. First public version.
diff --git a/docs/_build/html/_sources/index.rst.txt b/docs/_build/html/_sources/index.rst.txt
@@ -33,6 +33,7 @@ Contents
    tutorial
    principle
    api
+   change_log
    contact
 
 

diff --git a/docs/_build/html/_sources/introduction.rst.txt b/docs/_build/html/_sources/introduction.rst.txt
@@ -27,17 +27,22 @@ biologically meaningful metrics to assess cluster goodness, and `Shapley Value <
 to attain a single stable solution.
 
 .. note::
-    For larger dataset, it is advisable to run on a Linux system with enough RAM and space. The current release has been tested on both Mac and 
-    Linux Cluster.
+    A typical scRNA-Seq dataset (10k cells) with four provided annotation-sets can run in ~10 minutes in a laptop. For larger datasets (100k cells) or multiome 
+    (GEX + ATAC) with > 100k features (gene + peak), it is recommended to run the program in the high-performance compute environment.
 
 Inputs and Outputs
 ---------------------
 scTriangulate is designed for h5ad file, it works seemlessly with popular scanpy packages if you are familiar with it. In addtion to that, we offer 
 a myriad of preprocessing convenient functions to ease the file conversion process, currently we accept following format:
 
     * **Anndata** (.h5 & .h5ad), the annotations are the columns in adata.obs
-    * **mtx**, annotations information should be supplied as addtional txt file (barcode -> label)
-    * **dense matrix**, txt expression matrix, annotations should be aupplied as addtional txt file.
+    * **mtx**, annotations information should be supplied as addtional txt file (see below example and :ref:`reference_to_add_annotation`)
+    * **dense matrix**, txt expression matrix, annotations should be aupplied as addtional txt file (see below example and :ref:`reference_to_add_annotation`).
+
+    .. csv-table:: annotation txt file
+        :file: ./_static/annotation_txt.csv
+        :widths: 10,10
+        :header-rows: 1
 
 Optionally, users can supply their own umap embeddings, Please refer to :ref:`reference_to_add_umap` function for the details.
 

diff --git a/docs/_build/html/_sources/principle.rst.txt b/docs/_build/html/_sources/principle.rst.txt
@@ -101,10 +101,14 @@ no longer be considered in the marker genes and downstream assessment::
 Visualization
 ----------------
 
+scTriangulate offers a powerful toolkit allowing end users to visualize the hidden heterogeneity in many different ways, also the ``color`` Module
+provide necessary function to assist in making publication quality figures. Here we highlight some of the plotting function and we would like to refer
+the users to the ``API`` part for more details.
+
 plot_heterogeneity
 ~~~~~~~~~~~~~~~~~~~~~
 
-This is the main feature of scTriangulate visualization functionality, built on top of scanpy. since scTriangualte mix-and-match cluster boundaries from 
+This is the main feature of scTriangulate visualizations, built on top of scanpy. Since scTriangualte can mix-and-match cluster boundaries from 
 diverse annotations, it empowers the users to discover further and hidden heterogeneity. Now, question is how the user can visualize the heterogeneity?
 
 .. image:: ./_static/plot_heterogeneity_chop.png
@@ -113,11 +117,11 @@ diverse annotations, it empowers the users to discover further and hidden hetero
     :align: center
     :target: target
 
-Now as you can see, **annoatation@c1** has been suggested to be divided by two sub populations, now we want to know:
+The philosophy behind this function is to first pick a viewpoint from which we want to look at the final result. For instance, here we choose "annotation1" as 
+the viewpoint. As you can see, **annoatation@c1** has been suggested to be divided by two sub populations, now we want to know:
 
 1. how these two sub populations are lait out on umap?
 2. what are the differentially expressed features between these two sub populations?
-3. How many cells are in each sub populations?
 
 Let's show some of the functionalities:
 
@@ -151,11 +155,67 @@ Let's show some of the functionalities:
     :align: center
     :target: target
 
+plot_concordance
+~~~~~~~~~~~~~~~~~~
+
+When we have more than 2 annotation-sets, we want to know how they correspond to each other, what fraction of cells in annotation1 flow into
+another annotation and vice versus::
+
+    sctri.plot_concordance(key1='azimuth',key2='pruned',style='3dbar')
+
+.. image:: ./_static/3dbar.png
+    :height: 400px
+    :width: 500px
+    :align: center
+    :target: target
+
+plot_clusterability
+~~~~~~~~~~~~~~~~~~~~~~
+
+Do you want to know for a specific annotation-set, which cluster is most likely to be subdivided and which is the least? We refer to this as
+clusterability::
+
+    sctri.plot_clusterability(key='sctri_rna_leiden_1',col='raw',fontsize=8)
+
+.. image:: ./_static/plot_clusterability.png
+    :height: 400px
+    :width: 500px
+    :align: center
+    :target: target
+
+plot_long_heatmap
+~~~~~~~~~~~~~~~~~~~~~~
+
+A heatmap that can be arbitrarily long and ALWAYS display every gene::
+
+    sctri.plot_long_umap(n_features=20,figsize=(20,20))
+
+.. image:: ./_static/long_heatmap.png
+    :height: 400px
+    :width: 500px
+    :align: center
+    :target: target
+
+plot_multi_modal_feature_rank
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In multi-modal setting, a cluster's identify usually defined by all modalities, do you want to know by which modality a cluster is mainly defined?::
+
+    sctri.plot_multi_modal_feature_rank(cluster='sctri_rna_leiden_2@10')
+
+.. image:: ./_static/plot_multi_modal_feature_rank.png
+    :height: 500px
+    :width: 500px
+    :align: center
+    :target: target
+
+
+
+
 
-Other plotting funcctions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-**1.plot_confusion**
+plot_confusion
+~~~~~~~~~~~~~~~~
 
 It allows you to visualize the stability of each clustes in one annotation::
 

diff --git a/docs/_build/html/_sources/tutorial.rst.txt b/docs/_build/html/_sources/tutorial.rst.txt
@@ -13,7 +13,7 @@ cells before filtering.
 
 Here we first conduct basic single cell analysis to obtain Leiden clustering results, however, at various resolutions (r=1,2,3). Smaller resolutions lead to
 broader clusters, and larger resolution value will result in more granular clustering. We leverage scTriangulate to take the three resolutions as the query 
-annotations, and automatically mix-and-match cluster boundary from different resolutions, which at the end, yield scTriangulate reconciled cluster solutions.
+annotation-sets, and automatically mix-and-match cluster boundary from different resolutions, which at the end, yield scTriangulate reconciled cluster solutions.
 
 Download and preprocessing
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -26,7 +26,7 @@ First load the packages::
     from sctriangulate import *
     from sctriangulate.preprocessing import *
 
-The h5 file can be downloaded from `here <http://altanalyze.org/scTriangulate/scRNASeq/pbmc_10k_v3.h5>`_. We used scanpy and scTriangulate
+The h5 file can be downloaded from `here <http://altanalyze.org/scTriangulate/scRNASeq/pbmc_10k_v3.h5>`_. First use scanpy and scTriangulate
 preprocessing module to conduct basic QC filtering and single cell pipeline::
 
     adata = sc.read_10x_h5('./pbmc_10k_v3_filtered_feature_bc_matrix.h5')
@@ -69,22 +69,22 @@ Visualize the important QC metrics and make the decision on the proper cutoffs::
    :align: right
    :target: target
 
-We filtered out the cells whose min_genes = 300, min_counts = 500, mt > 20%, 11,022 cells left::
+Then filter out the cells whose min_genes = 300, min_counts = 500, mt > 20%, 11,022 cells left::
 
     sc.pp.filter_cells(adata, min_genes=300)
     sc.pp.filter_cells(adata, min_counts=500)
     adata = adata[adata.obs.pct_counts_mt < 20, :]  
     print(adata)  # 11022 × 33538
 
 
-Then we will use scTriangulate wrapper functions to obtain the Leiden clutser results at different resolutions (r=1,2,3), specifically, 
+Then use scTriangulate wrapper functions to obtain the Leiden clutser results at different resolutions (r=1,2,3), specifically, 
 we chose number of PCs as 50, and 3000 highly variable genes::
 
     adata = scanpy_recipe(adata,is_log=False,resolutions=[1,2,3],pca_n_comps=50,n_top_genes=3000)
 
-After running this command, we will have three columns in ``adata.obs``, namely, ``sctri_rna_leiden_1``, ``sctri_rna_leiden_2``, ``sctri_rna_leiden_3``. 
+After running this command, you will have three columns in ``adata.obs``, namely, ``sctri_rna_leiden_1``, ``sctri_rna_leiden_2``, ``sctri_rna_leiden_3``. 
 Also a h5ad file named ``adata_after_scanpy_recipe_rna_1_2_3_umap_True.h5ad`` will be automatically saved to current directory so there's no need to re-run this
-step again, Now let's visualize them::
+pre-processing step again, Now let's visualize them::
 
     umap_dual_view_save(adata,cols=['sctri_rna_leiden_1','sctri_rna_leiden_2','sctri_rna_leiden_3'])
     # three umaps will be saved to your current directory.
@@ -95,9 +95,9 @@ step again, Now let's visualize them::
    :align: center
    :target: target
 
-As we can see, different resolutions lead to various number of clusters, and it is clear that certain regions got sub-divided in higher resolutions. However,
-we don't know whether this sub-populations are valid off the top of our heads. **Here comes scTriangulate, which will scan each clusters at each resolutions,
-and mix-and-match different solutions to achieve an optimal one.**
+As you can see, different resolutions lead to various number of clusters, and it is clear that certain regions get sub-divided in higher resolutions. However,
+we don't know whether this sub-populations are valid off the top of our heads. Here comes scTriangulate, which will scan each clusters at each resolution,
+and mix-and-match different solutions to achieve a reconciled result.
 
 Running scTriangulate
 ~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -110,29 +110,27 @@ handle every thing for us::
 
     adata = sc.read('adata_after_scanpy_recipe_rna_1_2_3_umap_True.h5ad')
     sctri = ScTriangulate(dir='./output',adata=adata,query=['sctri_rna_leiden_1','sctri_rna_leiden_2','sctri_rna_leiden_3'])
-    sctri.lazy_run()  # done!!!
+    sctri.lazy_run(assess_pruned=False,viewer_cluster=False,viewer_heterogeneity=False)  # done!!!
 
 We first instantiate ``ScTriangulate`` object by specify:
 
 1. ``dir``, where all the intermediate and final results/plots will go into?
 2. ``adata``, the adata that we want to start with.
 3. ``query``, a list contains all the annotations that we want to triangulate.
 
-The ``dir`` doesn't need to be an existing folder, the program will automatically create one if not present.
+The ``dir`` doesn't need to be an existing folder, the program will automatically create one if not present. More information about instantiation can be
+found in the API :ref:`reference_to_instantiation`.
 
-.. note::
-
-    To save time, please run lazy_run(scale_sccaf=False,viewer_cluster=False), the first argument instruct the program to compute SCCAF score without
-    firstly scaling the data, which will save quite a lot time. By default this option is set to True. The second argument is to instruct the program to
-    not build the cluster_viewer, it will take some time to generate all the images that the cluster viewer needs.
 
+The purpose of three arguments in ``lazy_run()`` is just to save time, you can leave it as default by calling ``lazy_run()``, which will automatically
+assess the stability of the final defined cluster, generate the cluster viewer and heterogeneity viewer. However, if you only want to obtain the scTriangulate
+reconciled cluster information, you don't need the above three steps, so we turn them off.
 
-However for the purpose of instructing users how to understand this tool, we are going to run it step by step. 
 
 .. note::
 
-    Users can switch to manually run scTriangulat step by step, in order for granular operations/modifications. The instructions are as below.
-    The above ``lazy_run()`` function basically takes care step 1-4 automatically with default parameter settings.
+    However for the purpose of instructing users how to understand this tool, we are going to run it step by step to let the readers get a sense
+    of how the program work. We refer to it as Manual Run.
 
 Manual Run
 <<<<<<<<<<<<<
@@ -161,7 +159,7 @@ Step2: compute_shapley
 ++++++++++++++++++++++++
 
 The second step is to utilize the calculated metrics, and assess which annotation/cluster is the best for **each single cell**. So the program iterate each row,
-which is a single cell, retrive all the metrics associated with each cluster, and calculate shapley value of each cluster (in this case, each single cell has 
+representing a single cell, retrive all the metrics associated with each cluster, and calculate shapley value for each cluster (in this case, each single cell has 
 three conflicting clusters). Then the program will assign the cell to the "best" clusters amongst all solutions. We refer the resultant cluster assignment as
 ``raw`` cluster result::
 
@@ -190,7 +188,7 @@ unstable invalid clusters will be reassigned to its nearest neightbor's cluster
     sctri.prune_result()
     sctri.serialize('break_point_after_prune.p')
 
-A column named "pruned" will be added, also "confidence" column stores the confidence the program hold to call it out.
+A column named "pruned" will be added, also "confidence" column stores the confidence the program hold to call this cluster out.
 
 .. csv-table:: After prune result
     :file: ./_static/tutorial/single_modality/head_check_after_prune.csv
@@ -201,10 +199,10 @@ A column named "pruned" will be added, also "confidence" column stores the confi
 Step4: building the viewer
 ++++++++++++++++++++++++++++++
 
-We provide an automatically generated webpage, called scTriangulate viewer, to allow users to dynamically navigate the robustness of each cluster from each
+We provide an automatically generated html page, called scTriangulate viewer, to allow users to dynamically toggle different clusters the robustness of each cluster from each
 annotations (cluster viewer). Also, it enables the inspection of further heterogeneity that might not have been captured by a 
 single annotation (hetergeneity viewer). The logics of following codes are simple, we first build html, then we generate the figures that the html page would 
-need to render it::
+need for proper rendering::
 
     sctri = ScTriangulate.deserialize('output/break_point_after_prune.p')
     sctri.viewer_cluster_feature_html()
@@ -272,7 +270,7 @@ Discover hidden heterogeneity
 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
 
 scTrangulate, by design, could greedily discover any hidden heterogeneity via levaraging the cluster boundaries from each annotation. Here the scTriangulate 
-suggests sub-dividing of CD14 Mono population which has been annotated in Azimuth reference::
+suggests sub-dividing of CD14 Mono population which has not been annotated in Azimuth reference::
 
     # if we run lazy_run
     sctri = ScTriangulate.deserialize('output/after_pruned_assess.p)
@@ -288,7 +286,8 @@ suggests sub-dividing of CD14 Mono population which has been annotated in Azimut
    :align: center
    :target: target
 
-Then by pulling out the marker genes the program detected, we reason that it was caused by at least three distinctive sub-groups:
+Then by pulling out the marker genes the program detected, we reason that the heterogeneity reflect at least three sub cell states, supported by
+`literatures <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6077267/>`_:
 
 1. **classifical CD14+ Monocyte**: CLEC5A, CLEC4D, S100A9
 2. **intermediate CD14+ Monocyte**: FCGR3A, CLEC10A, HLA-DRA
@@ -311,9 +310,9 @@ Multi-modal workflow
 -----------------------------------
 
 In this example run, we are going to use a CITE-Seq dataset from human total nucleated cells (TNCs). This dataset contains 31 ADTs and in toal 8,491 cells.
-It is normal practice to analyze and cluster each modality's data seperately, and then try to merge them together. However, to reconcile the clustering
+It is a common practice to analyze and cluster based on each modality seperately, and then try to merge them result together. However, to reconcile the clustering
 differences are not a trivial tasks and it requires the simoutaneous consideration of both RNA gene expression and surface protein. Thankfully, scTriangulate
-can help to make the decision.
+can help us make the decision.
 
 the dataset can be downloaded from the `website <http://altanalyze.org/scTriangulate/CITESeq/TNC_r1-RNA-ADT.h5>`_.
 
@@ -406,7 +405,7 @@ Running scTriangulate
 Just use ``lazy_run()`` function, I have broken it down in the single_modality section::
 
     sctri = ScTriangulate(dir='output',adata=adata_combine,add_metrics={},query=['sctri_adt_leiden_1','sctri_adt_leiden_2','sctri_adt_leiden_3','sctri_rna_leiden_1','sctri_rna_leiden_2','sctri_rna_leiden_3'])
-    sctri.lazy_run()
+    sctri.lazy_run(assess_pruned=False,viewer_cluster=False,viewer_heterogeneity=False)
 
 All the intermediate results would be stored at ./output folder.
 
@@ -445,7 +444,8 @@ scTriangulate allows the triangulation amongst diverse resolutions and modalitie
    :align: center
    :target: target
 
-scTriangulate discovers new cell state due to ADT markers, azimuth prediction can be downloaded `from here <http://altanalyze.org/scTriangulate/CITESeq/azimuth_pred.tsv>`_::
+scTriangulate discovers new cell state due to ADT markers (CD56 high MAIT cell), supported by `previous literature <https://www.pnas.org/content/114/27/E5434>`_,
+azimuth prediction can be downloaded `from here <http://altanalyze.org/scTriangulate/CITESeq/azimuth_pred.tsv>`_::
 
     sctri = ScTriangulate.deserialize('output/after_pruned_assess.p')
     add_azimuth(sctri.adata,'azimuth_pred.tsv')

diff --git a/docs/_build/html/_static/annotation_txt.csv b/docs/_build/html/_static/annotation_txt.csv
@@ -0,0 +1,9 @@
+barcode,label
+D150_GTGTTAGAGGTGCTAG,Mesothelial FB
+D150_ACTACGATCTCAGGCG,Mesothelial FB
+D150_GGAGATGTCACACCGG,Mesothelial FB
+D150_CGGACACGTCGTGCCA,Mesothelial FB
+D062_TTCTTCCCACGACTAT,Mesothelial FB
+D150_TCGATTTAGGATATGT,Mesothelial FB
+D150_GGTAATCGTAAGCTCT,Mesothelial FB
+D150_GCCATGGAGGGTGAAA,Mesothelial FB
-Original file line number
+Diff line change
@@ Expand Up / @@ -33,6 +33,7 @@ Contents @@
        tutorial
        principle
        api
+       change_log
        contact
@@ Expand Down @@