You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Say you have a range of negative numbers, -10 -> -1. You add the lower half of the range to one sketch, and the upper half of the range to another sketch. If you add these sketches together ("Sketch AB"), it will produce different getValueAtQuantile() results than the same code in Python or the same data from both halves added to a single sketch ("Sketch C") in TypeScript. This only happens with negative numbers: for positive ranges the values match. Full snippets for the ts-node and python interpreters is below, but here are how the results differ for a single sketch reading all the data versus a merge of two sketches with half the data:
import { DDSketch } from '@datadog/sketches-js'
// test data
const posA = [1, 2, 3, 4, 5]
const posB = [6, 7, 8, 9, 10]
// these two will each get half the positive data
const ddsPosA = new DDSketch()
const ddsPosB = new DDSketch()
// and then merge into this one
const ddsPosAB = new DDSketch()
// whereas this one reads all the positive data sequentially
const ddsPosC = new DDSketch()
posA.forEach(s => {
ddsPosA.accept(s)
ddsPosC.accept(s)
})
posB.forEach(s => {
ddsPosB.accept(s)
ddsPosC.accept(s)
})
ddsPosAB._copy(ddsPosA)
ddsPosAB.merge(ddsPosB)
// we'll repeat this for negative numbers
const negA = [-10, -9, -8, -7, -6]
const negB = [-5, -4, -3, -2, -1]
const ddsNegA = new DDSketch()
const ddsNegB = new DDSketch()
const ddsNegAB = new DDSketch()
const ddsNegC = new DDSketch()
negA.forEach(s => {
ddsNegA.accept(s)
ddsNegC.accept(s)
})
negB.forEach(s => {
ddsNegB.accept(s)
ddsNegC.accept(s)
})
ddsNegAB._copy(ddsNegA)
ddsNegAB.merge(ddsNegB)
// the percentiles should be the same
// for the sequential c sketches
// and the merged ab sketches
const qs = [0.01, 0.25, 0.50, 0.75, 0.99]
qs.forEach(q => {
console.log(`pq: ${q.toFixed(2)}\tc: ${ddsPosC.getValueAtQuantile(q).toFixed(5)}\tab: ${ddsPosAB.getValueAtQuantile(q).toFixed(5)}`)
})
qs.forEach(q => {
console.log(`nq: ${q.toFixed(2)}\tc: ${ddsNegC.getValueAtQuantile(q).toFixed(5)}\tab: ${ddsNegAB.getValueAtQuantile(q).toFixed(5)}`)
})
To see what I believe are "correct" results in the python implementation:
from ddsketch import DDSketch
# test data
pos_a = [1, 2, 3, 4, 5]
pos_b = [6, 7, 8, 9, 10]
# these two will each get half the positive data
dds_pos_a = DDSketch()
dds_pos_b = DDSketch()
# and then merge into this one
dds_pos_ab = DDSketch()
# whereas this one reads all the positive data seqquentially
dds_pos_c = DDSketch()
for s in pos_a:
dds_pos_a.add(s)
dds_pos_c.add(s)
for s in pos_b:
dds_pos_b.add(s)
dds_pos_c.add(s)
dds_pos_ab._copy(dds_pos_a)
dds_pos_ab.merge(dds_pos_b)
# we'll repeat this for negative numbers
neg_a = [-10, -9, -8, -7, -6]
neg_b = [-5, -4, -3, -2, -1]
dds_neg_a = DDSketch()
dds_neg_b = DDSketch()
dds_neg_c = DDSketch()
dds_neg_ab = DDSketch()
for s in neg_a:
dds_neg_a.add(s)
dds_neg_c.add(s)
for s in neg_b:
dds_neg_b.add(s)
dds_neg_c.add(s)
dds_neg_ab._copy(dds_neg_a)
dds_neg_ab.merge(dds_neg_b)
# the percentiles should be the same
# for the sequential c sketches
# and the merged ab sketches
qs = [0.01, 0.25, 0.50, 0.75, 0.99]
for q in qs:
print("pq: {:.2f}\tc: {:.5f}\tab: {:.5f}".format(q, dds_pos_c.get_quantile_value(q), dds_pos_ab.get_quantile_value(q)))
for q in qs:
print("nq: {:.2f}\tc: {:.5f}\tab: {:.5f}".format(q, dds_neg_c.get_quantile_value(q), dds_neg_ab.get_quantile_value(q)))
I've tried comparing the TypeScript vs Python implementations looking for differences that could explain this behavior, but nothing jumped out at me. It does look like the sumOfRange() utility function in the store is used in _adjust() and merge() in situations where the equivalent Python code might be an exclusive range while sumOfRange() is inclusive, but when I tried overriding to add a -1 to the end index for those calls, it didn't seem to change anything.
If anyone has any ideas what the problem might be, I'd be happy to look further and keep testing.
The text was updated successfully, but these errors were encountered:
Say you have a range of negative numbers, -10 -> -1. You add the lower half of the range to one sketch, and the upper half of the range to another sketch. If you add these sketches together ("Sketch AB"), it will produce different
getValueAtQuantile()
results than the same code in Python or the same data from both halves added to a single sketch ("Sketch C") in TypeScript. This only happens with negative numbers: for positive ranges the values match. Full snippets for the ts-node and python interpreters is below, but here are how the results differ for a single sketch reading all the data versus a merge of two sketches with half the data:Whereas in Python the same test returns:
To reproduce in ts-node:
To see what I believe are "correct" results in the python implementation:
I've tried comparing the TypeScript vs Python implementations looking for differences that could explain this behavior, but nothing jumped out at me. It does look like the
sumOfRange()
utility function in the store is used in_adjust()
andmerge()
in situations where the equivalent Python code might be an exclusive range whilesumOfRange()
is inclusive, but when I tried overriding to add a-1
to the end index for those calls, it didn't seem to change anything.If anyone has any ideas what the problem might be, I'd be happy to look further and keep testing.
The text was updated successfully, but these errors were encountered: