Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update UnderstandingKLLBounds.md #184

Merged
merged 3 commits into from
Aug 3, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 16 additions & 26 deletions docs/KLL/UnderstandingKLLBounds.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,30 +63,20 @@ public class QuantileBoundsTest {
println("r2: " + r2);
println("");

double r1LB = sk.getRankLowerBound(r1);
println("r1LB: " + r1LB);
double r1UB = sk.getRankUpperBound(r1);
println("r1UB: " + r1UB);

double r2LB = sk.getRankLowerBound(r2);
println("r2LB: " + r2LB);
double r2UB = sk.getRankUpperBound(r2);
println("r2UB: " + r2UB);
println("r1LB(r1): " + sk.getRankLowerBound(r1));
println("r1UB(r1): " + sk.getRankUpperBound(r1));
println("r2LB(r2): " + sk.getRankLowerBound(r2));
println("r2UB(r2): " + sk.getRankUpperBound(r2));
println("");

double q1LB = sk.getQuantileLowerBound(r1);
println("q1LB(r1): " + q1LB);
double q1UB = sk.getQuantileUpperBound(r1);
println("q1UB(r1): " + q1UB);

double q2LB = sk.getQuantileLowerBound(r2);
println("q2LB(r2): " + q2LB);
double q2UB = sk.getQuantileUpperBound(r2);
println("q2UB(r2): " + q2UB);
println("q1LB(r1): " + sk.getQuantileLowerBound(r1));
println("q1UB(r1): " + sk.getQuantileUpperBound(r1));
println("q2LB(r2): " + sk.getQuantileLowerBound(r2));
println("q2UB(r2): " + sk.getQuantileUpperBound(r2));
println("");
}

static void println(Object o) { System.out.println(o.toString()); }
private static void println(Object o) { System.out.println(o.toString()); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoiding things like this, which detract from the main focus of the example, is why i increasingly prefer python for these sorts of things.

}
```

Expand All @@ -101,10 +91,10 @@ q2: 620.0
r1: 0.5
r2: 0.52

r1LB: 0.4932237572729862
r1UB: 0.5067762427270138
r2LB: 0.5132237572729862
r2UB: 0.5267762427270138
r1LB(r1): 0.4932237572729862
r1UB(r1): 0.5067762427270138
r2LB(r2): 0.5132237572729862
r2UB(r2): 0.5267762427270138

q1LB(r1): 494.0
q1UB(r1): 608.0
Expand All @@ -123,15 +113,15 @@ The input stream of 1000 values has a big discontinuity starting at *i* = 501. S

We choose two quantiles on either side of the discontinuity, 500 and 620, and get their respective ranks of 0.5 and 0.52. Note that because of the discontinuity the difference in the input quantiles is 120/1100 or ~10.9%, while the difference in their respective ranks is only 2%.

Next we compute the upper and lower rank bounds of the two resulting ranks of 0.5 and 0.52, which are given above. Note that the UB - LB of each rank is about .013 which is 2 X .0067. This means that the true rank of each quantile is within the UB - LB range of ranks with a confidence of 99%, which is about +/- 2.6 standard deviations from the mean.
Next we compute the rank upper bound (UB) and rank lower bound (LB) of the two resulting ranks of 0.5 and 0.52, which are given above. Note that the UB - LB of each rank is about .013 which is 2 X .0067. This means that the true rank of each quantile is within the UB - LB range of ranks with a confidence of 99%, which is about +/- 2.6 standard deviations from the estimate.

Then we compute the upper and lower quantile bounds of the same two resulting ranks of 0.5 and 0.52. Note that the UB - LB quantile range of *r1* is 114/1100 or 10.4%, because in between the rank UB and LB is the discontinuity. These points are shown in the next image.
Then we compute the quantile UB and LB of the same two resulting ranks of 0.5 and 0.52. Note that the UB - LB quantile range of *r1* is 114/1100 or 10.4%, because in between the rank UB and LB is the discontinuity. These points are shown in the next plot

<img class="doc-img-half" src="{{site.docs_img_dir}}/kll/QuantileBounds2.png" alt="QuantileBounds2.png" />

This graphically illustrates why the mathematical guarantee of error applies only to the rank domain, because the input quantile domian could have huge discontinuities. Nonetheless, we **can** say that the true quantile does lie within that UB - LB quantile range with a confidence of 99%. But we cannot guarantee anything about the UB - LB quantile difference and relate that to a quantile accuracy compared to the total range of the input values.

Our Classic, KLL, and REQ quantiles sketches are input insensitive and do not know or care what the input distribution looks like. It does not have to be a smooth and well behaved function! This is not the case with other heuristic quantile algorithms,
Our Classic, KLL, and REQ quantiles sketches are input insensitive and do not know or care what the input distribution looks like. It does not have to be a smooth and well behaved function. This is not the case with other heuristic quantile algorithms,



Expand Down
Loading