Use confidence for identifying correlated genes.

tanaylab · Jun 17, 2024 · 888fa4e · 888fa4e
1 parent 7a7e82f
commit 888fa4e
Show file tree

Hide file tree

Showing 7 changed files with 93 additions and 54 deletions.
diff --git a/Project.toml b/Project.toml
@@ -10,6 +10,7 @@ Daf = "1375bf9c-a47d-45a1-aad5-626dd8629d98"
 DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
 Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
 SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
 Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

diff --git a/docs/v0.1.0/.documenter-siteinfo.json b/docs/v0.1.0/.documenter-siteinfo.json
@@ -1 +1 @@
-{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-06-16T14:04:57","documenter_version":"1.4.1"}}
+{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-06-17T13:39:41","documenter_version":"1.4.1"}}
diff --git a/docs/v0.1.0/boxes.html b/docs/v0.1.0/boxes.html
@@ -162,10 +162,10 @@ <h1 id="Boxes">
     daf::DafWriter;
     min_significant_gene_UMIs::Integer = 40,
     gene_fraction_regularization::AbstractFloat = 1e-5,
-    confidence::AbstractFloat = 0.9,
+    fold_confidence::AbstractFloat = 0.9,
     max_box_span::AbstractFloat = 2.0,
     max_neighborhood_span::AbstractFloat = 2.0,
-    min_gene_correlation::AbstractFloat = 0.5,
+    correlation_confidence::AbstractFloat = 0.99,
     max_deviant_genes_fraction::AbstractFloat = 0.01,
     overwrite::Bool = false,
 )::Nothing
@@ -181,7 +181,7 @@ <h1 id="Boxes">
 </code> (by default, 
 <code>1e-5
 </code>). Since the fraction of the gene is a random variable, we decrease the high fraction and increase the low fraction by a factor based on the 
-<code>confidence
+<code>fold_confidence
 </code> of the test (by default, 0.9), assuming a multinomial distribution. In addition, if the sum of the total UMIs of the gene in both metacells is less than 
 <code>min_significant_gene_UMIs
 </code> (by default, 
@@ -206,9 +206,9 @@ <h1 id="Boxes">
 <code>max_neighborhood_span
 </code>. These neighborhoods may overlap. The main neighborhoods of different boxes may even be identical.
 </li>
-<li>For each box, we compute the set of genes which have at least the 
-<code>min_gene_correlation
-</code> with some other gene(s) in its main neighborhood. We restrict the correlated set of genes of each metacell to be the intersection of this set with the set from its box in the previous round.
+<li>For each box, we compute the set of genes which are correlated (using the 
+<code>correlation_confidence
+</code>) with some other gene(s) in its main neighborhood.
 </li>
 <li>If the new sets of correlated genes only differ up to 
 <code>max_convergence_fraction

diff --git a/docs/v0.1.0/identify_genes.html b/docs/v0.1.0/identify_genes.html
@@ -266,7 +266,7 @@ <h1 id="Identify-Genes">
 <code class="language-julia hljs">function identify_correlated_genes!(
     daf::DafWriter;
     gene_fraction_regularization::AbstractFloat = 1e-5,
-    min_gene_correlation::AbstractFloat = 0.5,
+    correlation_confidence::AbstractFloat = 0.9,
     overwrite::Bool = false,
 )::Nothing
 </code>
@@ -284,11 +284,15 @@ <h1 id="Identify-Genes">
 </li>
 <li>Correlate this between all the pairs of genes.
 </li>
-<li>Find the maximal absolute correlation for each gene (that is, strong anti-correlation also counts).
+<li>For each gene, shuffle its values along all metacells, and again correlate this between all the pairs of genes.
 </li>
-<li>Identify the genes which have at least one gene with a correlation of at least 
-<code>min_gene_correlation
-</code>.
+<li>Find the maximal absolute correlation for each gene in both cases (that is, strong anti-correlation also counts).
+</li>
+<li>Find the 
+<code>correlation_confidence
+</code> quantile correlation of the shuffled data.
+</li>
+<li>Identify the genes that have at least that level of correlations in the unshuffled data.
 </li>
 </ol>
 <p>CONTRACT