Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save and load TDigest? #13

Open
richardxy opened this issue Dec 5, 2018 · 6 comments
Open

Save and load TDigest? #13

richardxy opened this issue Dec 5, 2018 · 6 comments

Comments

@richardxy
Copy link

Hi Erik,

I'm using isarn-sketches-spark package to compute T-digest on Spark and would like to persist TDigestSQL or TDigest data into a PostgreSQL database. Potentially I also would like to be able to load the TDigest data from the database.

What would be the best way of doing that, if ever possible? Thanks very much!

Richard

@richardxy
Copy link
Author

Anyway to output a TDigest as a JSON, and load a JSON to instantiate a TDigest? An example would be awesome.

@erikerlandson
Copy link
Member

Hi @richardxy,

There are currently two examples of serialization for TDigests. One is using spark's saveAsObjectFile method:
isarn/isarn-sketches-spark#3 (comment)

Another is using streams and class loaders:
https://github.com/isarn/isarn-sketches/blob/develop/src/test/scala/org/isarnproject/sketches/TDigestTest.scala#L189
https://github.com/isarn/isarn-scalatest/blob/develop/src/main/scala/org/isarnproject/scalatest/serde.scala#L20

I also have an internal serialization that shows how to "flatten" and "unflatten" the tree structures:
https://github.com/isarn/isarn-sketches-spark/blob/develop/src/main/scala/org/apache/spark/isarnproject/sketches/udt/TDigestUDT.scala#L67

I don't have a ready-made example going to/from JSON, but this tutorial might help you. I will also look into it further when I have time.
https://www.oreilly.com/library/view/scala-cookbook/9781449340292/ch15s02.html

Serializing these objects to/from external formats is a somewhat open topic - if you happen to get something working please feel free to post here, or via a pull request, etc! My upcoming implementation of TDigest will have a structure that is a bit more amenable to serialization.

@richardxy
Copy link
Author

Great! Thank you very much! Look forward to your upcoming implementation.

@richardxy
Copy link
Author

richardxy commented Dec 6, 2018

I saw TDigest object can call toString. Say,
scala> val tdstr = td1.toString
tdstr: String = TDigestSQL(TDigest(0.5,0,3,TDigestMap(1.0 -> (1.0, 1.0), 3.0 -> (1.0, 2.0), 5.0 -> (1.0, 3.0))))

Any way to reconstruct a TDigest or TDigestSQL from that string? If yes, that'd be awesome.

Thanks!

@erikerlandson
Copy link
Member

It is probably possible in theory to parse that back into an object, but you'd have to write custom parsing code, and if one went down that road, it would be more effective to just use spray json or something similar so the serialized format was portable.

I need better solutions for serialization for users of this package. I will try to carve out some time for it soon, but I can't promise when.

@richardxy
Copy link
Author

That'll be great. Thanks.

I looked into the implementation of TDigestMap looking for a way to construct a TDigestMap object using x0, x and m (like the values 1.0 -> (1.0, 1.0) above). It seems "update" function is promising, but it is private and can't be called from outside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants