-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add T-Digest data structure #11665
Conversation
This pull request was exported from Phabricator. Differential Revision: D66435741 |
✅ Deploy Preview for meta-velox canceled.
|
446757a
to
086b852
Compare
Summary: Add the T-Digest data structure implementation to be used in T-Digest related functions. Also extract the `getRandomSeed` test utility that is used in multiple unit tests. Differential Revision: D66435741
This pull request was exported from Phabricator. Differential Revision: D66435741 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 CMake
Summary: Add the T-Digest data structure implementation to be used in T-Digest related functions. Also extract the `getRandomSeed` test utility that is used in multiple unit tests. Differential Revision: D66435741
086b852
to
870ef16
Compare
This pull request was exported from Phabricator. Differential Revision: D66435741 |
Summary: Add the T-Digest data structure implementation to be used in T-Digest related functions. Also extract the `getRandomSeed` test utility that is used in multiple unit tests. Differential Revision: D66435741
870ef16
to
e8ddf2d
Compare
This pull request was exported from Phabricator. Differential Revision: D66435741 |
} | ||
{ | ||
SCOPED_TRACE( | ||
"select to_base64(cast(tdigest_agg(x) as varbinary)) from unnest(sequence(0, 1000)) as t(x)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe also test ... from unnest(shuffle(sequence(0, 1000)))...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got the exactly same binary from this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yuhta It might be because the default compression is too large? I saw this query returns different binary with and without shuffle().
select to_base64(cast(tdigest_agg(cast(x as double), 1, 0.01) as varbinary)) from unnest(shuffle(sequence(0, 1000))) as t(x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah compression 0.01 (actually 10 in the implementation) would not satisfy the test error bound though. Also it's preferred to have something deterministic in this case. Let me add one still for sequence in order, but with a smaller compression that still satisfy the error bound.
Summary: Add the T-Digest data structure implementation to be used in T-Digest related functions. Also extract the `getRandomSeed` test utility that is used in multiple unit tests. Differential Revision: D66435741
e8ddf2d
to
342c45b
Compare
This pull request was exported from Phabricator. Differential Revision: D66435741 |
Summary: Add the T-Digest data structure implementation to be used in T-Digest related functions. Also extract the `getRandomSeed` test utility that is used in multiple unit tests. Differential Revision: D66435741
342c45b
to
bffc3e1
Compare
This pull request was exported from Phabricator. Differential Revision: D66435741 |
Summary: Add the T-Digest data structure implementation to be used in T-Digest related functions. Also extract the `getRandomSeed` test utility that is used in multiple unit tests. Differential Revision: D66435741
bffc3e1
to
816d095
Compare
This pull request was exported from Phabricator. Differential Revision: D66435741 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
This pull request has been merged in efbf68e. |
Summary:
Add the T-Digest data structure implementation to be used in T-Digest
related functions. Also extract the
getRandomSeed
test utility that is usedin multiple unit tests.
Differential Revision: D66435741