Skip to content

Commit

Permalink
Fix Theta
Browse files Browse the repository at this point in the history
  • Loading branch information
leerho committed Sep 20, 2024
1 parent 54bfe39 commit 276fe92
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 153 deletions.
80 changes: 0 additions & 80 deletions _includes/toc.html
Original file line number Diff line number Diff line change
Expand Up @@ -94,88 +94,8 @@
<li><a href="{{site.docs_dir}}/HLL/HllSketchVsDruidHyperLogLogCollector.html">•HLL Sketch vs Druid HyperLogLogCollector</a></li>
</div>
</div>

<li><a href="{{site.docs_dir}}/Theta/ThetaSketches.html">•Theta Sketches</a></li>

<p id="theta-sketches">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_theta_sketches">Theta Sketches</a>
</p>
<div class="collapse" id="collapse_theta_sketches">
<li><a href="{{site.docs_dir}}/Theta/ThetaSketchFramework.html">•Theta Sketch Framework</a></li>

<p id="theta-examples">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_theta_examples">Theta Examples</a>
</p>
<div class="collapse" id="collapse_theta_examples">
<li><a href="{{site.docs_dir}}/Theta/ConcurrentThetaSketch.html">•Concurrent Theta Sketch</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaJavaExample.html">•Theta Sketch Java Example</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaSparkExample.html">•Theta Sketch Spark Example</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaPigUDFs.html">•Theta Sketch Pig UDFs</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaHiveUDFs.html">•Theta Sketch Hive UDFs</a></li>
</div>

<p id="kmv-tutorial">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_kmv_tutorial">KMV Tutorial</a>
</p>
<div class="collapse" id="collapse_kmv_tutorial">
<li><a href="{{site.docs_dir}}/Theta/InverseEstimate.html">•The Inverse Estimate</a></li>
<li><a href="{{site.docs_dir}}/Theta/KMVempty.html">•Empty Sketch</a></li>
<li><a href="{{site.docs_dir}}/Theta/KMVfirstEst.html">•First Estimator</a></li>
<li><a href="{{site.docs_dir}}/Theta/KMVbetterEst.html">•Better Estimator</a></li>
<li><a href="{{site.docs_dir}}/Theta/KMVrejection.html">•Rejection Rules</a></li>
<li><a href="{{site.docs_dir}}/Theta/KMVupdateVkth.html">•Update V(kth) Rule</a></li>
</div>

<p id="set-operations-and-p-sampling">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_set_operations_and_p-sampling">Set Operations and P-sampling</a>
</p>
<div class="collapse" id="collapse_set_operations_and_p-sampling">
<li><a href="{{site.docs_dir}}/Theta/ThetaSketchSetOps.html">•Set Operations</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaSetOpsCornerCases.html">•Model & Test Set Operations</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaPSampling.html"><i>p</i>-Sampling</a></li>
</div>

<p id="accuracy">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_accuracy">Accuracy</a>
</p>
<div class="collapse" id="collapse_accuracy">
<li><a href="{{site.docs_dir}}/Theta/ThetaAccuracy.html">•Basic Accuracy</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaAccuracyPlots.html">•Accuracy Plots</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaErrorTable.html">•Relative Error Table</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaSketchSetOpsAccuracy.html">•SetOp Accuracy</a></li>
<li><a href="{{site.docs_dir}}/Theta/AccuracyOfDifferentKUnions.html">•Unions With Different k</a></li>
</div>

<p id="size">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_size">Size</a>
</p>
<div class="collapse" id="collapse_size">
<li><a href="{{site.docs_dir}}/Theta/ThetaSize.html">•Theta Sketch Size</a></li>
</div>

<p id="speed">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_speed">Speed</a>
</p>
<div class="collapse" id="collapse_speed">
<li><a href="{{site.docs_dir}}/Theta/ThetaUpdateSpeed.html">•Update Speed</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaMergeSpeed.html">•Merge Speed</a></li>
</div>

<p id="theta-sketch-theory">
<a data-toggle="collapse" class="menu collapsed" href="#collapse_theta_sketch_theory">Theta Sketch Theory</a>
</p>
<div class="collapse" id="collapse_theta_sketch_theory">
<li><a href="{{site.docs_pdf_dir}}/ThetaSketchFramework.pdf">•Theta Sketch Framework (PDF)</a></li>
<li><a href="{{site.docs_pdf_dir}}/ThetaSketchEquations.pdf">•Theta Sketch Equations (PDF)</a></li>
<li><a href="{{site.docs_pdf_dir}}/DataSketches.pdf">•DataSketches (PDF)</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaConfidenceIntervals.html">•Confidence Intervals Notes</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaMergingAlgorithm.html">•Merging Algorithm Notes</a></li>
<li><a href="{{site.docs_dir}}/Theta/ThetaReferences.html">•Theta References</a></li>
</div>
</div>

<li><a href="{{site.docs_dir}}/Tuple/TupleSketches.html">•Tuple Sketches</a></li>

</div>

<p id="most-frequent">
Expand Down
10 changes: 5 additions & 5 deletions docs/Theta/ThetaSketches.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ layout: doc_page

<a id="theta-sketch-framework"></a>
## Theta Sketch Framework
Theta Sketches are a generalization of the well known <i>K<sup>th</sup> Minimum Value</i> (KMV)<sup>1,2</sup>
Theta Sketches are a generalization of the well known <i>K<sup>th</sup> Minimum Value</i> (KMV) [^1],[^2]
sketches in that KMV sketches are a form of Theta Sketch, but not all Theta Sketches are KMV.

The <a href="{{site.docs_pdf_dir}}/ThetaSketchFramework.pdf">Theta Sketch Framework</a> (TSF)
Expand Down Expand Up @@ -99,7 +99,7 @@ we are going to create a separate threshold variable and call it <i>theta (&thet
This effectively decouples #3 and #4 above from <i>k</i>. When the sketch is empty <i>&theta;</i> = 1.0.
After the sketch has filled with <i>k</i> minimum values <i>&theta;</i> is still 1.0.
When the next incoming unique value must be inserted into the sketch the <i>(k+1)<sup>th</sup></i>
minimum value, is assigned to <i>&theta;</i> and removed from the cache.<sup>3</sup>
minimum value, is assigned to <i>&theta;</i> and removed from the cache.[^3]

Ultimately, it will be the size of <i>S</i>, <i>|S|</i>, that will determine the stored size of a
sketch, which decouples #2 above from the value <i>k</i>.
Expand All @@ -111,11 +111,11 @@ We will discuss the RSE in a later section.

<img class="doc-img-full" src="https://datasketches.apache.org/docs/img/theta/ThetaSketch1.png" alt="ThetaSketch1" />

[1] Z. Bar-Yossef, T. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In <i>Randomization and Approximation Techniques in Computer Science</i>, pages 1–10. Springer, 2002.
[^1]: Z. Bar-Yossef, T. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In <i>Randomization and Approximation Techniques in Computer Science</i>, pages 1–10. Springer, 2002.

[2] See <a href="{{site.docs_dir}}/Theta/KMVempty.html">KMV Tutorial</a> for a brief tutorial on KMV Sketches.
[^2]: See <a href="{{site.docs_dir}}/Theta/InverseEstimate.html">KMV Tutorial</a> for a brief tutorial on KMV Sketches.

[3] This is a limited "KMV perspective" on how <i>&theta;</i> gets assigned. The attached paper
[^3]: This is a limited "KMV perspective" on how <i>&theta;</i> gets assigned. The attached paper
<a href="{{site.docs_pdf_dir}}/ThetaSketchFramework.pdf">Theta Sketch Framework</a>
presents multiple ways that <i>&theta;</i> can be assigned using the <i>Theta Choosing Function (TCF)</i>.
Different sketch algorithms have different TCFs.
69 changes: 1 addition & 68 deletions src/main/resources/docgen/toc.json
Original file line number Diff line number Diff line change
Expand Up @@ -75,74 +75,7 @@
]
},

{ "class":"Dropdown", "desc" : "Theta Sketches", "array":
[
{ "class":"Doc", "desc" : "Theta Sketch Framework", "dir" : "Theta", "file": "ThetaSketchFramework" },

{ "class":"Dropdown", "desc" : "Theta Examples", "array":
[
{"class":"Doc", "desc" : "Concurrent Theta Sketch", "dir" : "Theta", "file": "ConcurrentThetaSketch" },
{"class":"Doc", "desc" : "Theta Sketch Java Example", "dir" : "Theta", "file": "ThetaJavaExample" },
{"class":"Doc", "desc" : "Theta Sketch Spark Example", "dir" : "Theta", "file": "ThetaSparkExample" },
{"class":"Doc", "desc" : "Theta Sketch Pig UDFs", "dir" : "Theta", "file": "ThetaPigUDFs" },
{"class":"Doc", "desc" : "Theta Sketch Hive UDFs", "dir" : "Theta", "file": "ThetaHiveUDFs" },
]
},

{ "class":"Dropdown", "desc" : "KMV Tutorial", "array":
[
{"class":"Doc", "desc" : "The Inverse Estimate", "dir" : "Theta", "file": "InverseEstimate" },
{"class":"Doc", "desc" : "Empty Sketch", "dir" : "Theta", "file": "KMVempty" },
{"class":"Doc", "desc" : "First Estimator", "dir" : "Theta", "file": "KMVfirstEst" },
{"class":"Doc", "desc" : "Better Estimator", "dir" : "Theta", "file": "KMVbetterEst" },
{"class":"Doc", "desc" : "Rejection Rules", "dir" : "Theta", "file": "KMVrejection" },
{"class":"Doc", "desc" : "Update V(kth) Rule", "dir" : "Theta", "file": "KMVupdateVkth" },
]
},

{ "class":"Dropdown", "desc" : "Set Operations and P-sampling", "array":
[
{"class":"Doc", "desc" : "Set Operations", "dir" : "Theta", "file": "ThetaSketchSetOps" },
{"class":"Doc", "desc" : "Model & Test Set Operations", "dir" : "Theta", "file": "ThetaSetOpsCornerCases" },
{"class":"Doc", "desc" : "<i>p</i>-Sampling", "dir" : "Theta", "file": "ThetaPSampling" },
]
},

{ "class":"Dropdown", "desc" : "Accuracy", "array":
[
{"class":"Doc", "desc" : "Basic Accuracy", "dir" : "Theta", "file": "ThetaAccuracy" },
{"class":"Doc", "desc" : "Accuracy Plots", "dir" : "Theta", "file": "ThetaAccuracyPlots" },
{"class":"Doc", "desc" : "Relative Error Table", "dir" : "Theta", "file": "ThetaErrorTable" },
{"class":"Doc", "desc" : "SetOp Accuracy", "dir" : "Theta", "file": "ThetaSketchSetOpsAccuracy" },
{"class":"Doc", "desc" : "Unions With Different k", "dir" : "Theta", "file": "AccuracyOfDifferentKUnions" },
]
},

{ "class":"Dropdown", "desc" : "Size", "array":
[
{"class":"Doc", "desc" : "Theta Sketch Size", "dir" : "Theta", "file": "ThetaSize" },
]
},

{ "class":"Dropdown", "desc" : "Speed", "array":
[
{"class":"Doc", "desc" : "Update Speed", "dir" : "Theta", "file": "ThetaUpdateSpeed" },
{"class":"Doc", "desc" : "Merge Speed", "dir" : "Theta", "file": "ThetaMergeSpeed" },
]
},

{ "class":"Dropdown", "desc" : "Theta Sketch Theory", "array":
[
{"class":"Doc", "desc" : "Theta Sketch Framework (PDF)", "dir" : "", "file": "ThetaSketchFramework", "pdf":"true" },
{"class":"Doc", "desc" : "Theta Sketch Equations (PDF)", "dir" : "", "file": "ThetaSketchEquations", "pdf":"true" },
{"class":"Doc", "desc" : "DataSketches (PDF)", "dir" : "", "file": "DataSketches", "pdf":"true" },
{"class":"Doc", "desc" : "Confidence Intervals Notes", "dir" : "Theta", "file": "ThetaConfidenceIntervals" },
{"class":"Doc", "desc" : "Merging Algorithm Notes", "dir" : "Theta", "file": "ThetaMergingAlgorithm" },
{"class":"Doc", "desc" : "Theta References", "dir" : "Theta", "file": "ThetaReferences" },
]
},
]
},
{ "class":"Doc", "desc" : "Theta Sketches", "dir" : "Theta", "file": "ThetaSketches" },
{ "class":"Doc", "desc" : "Tuple Sketches", "dir" : "Tuple", "file": "TupleSketches" },
]
},
Expand Down

0 comments on commit 276fe92

Please sign in to comment.