Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using of jsoniter-scala for safer and faster parsing/serialization of GeoJSON #3

Open
plokhotnyuk opened this issue Apr 27, 2020 · 2 comments

Comments

@plokhotnyuk
Copy link

If you interested to have a more efficient and safer solution then please consider using jsoniter-scala for that. It allows parsing from byte arrays, java.nio.ByteBuffer or java.io.InputStream immediately to your data structures and back with minimum allocations and CPU usage. Also, it can read white-space separated JSON values from java.io.InputStream without the need to hold all input in RAM.

Here are results of benchmarks that compare GeoJSON parsing and serialization using different Scala libraries on different JVMs (throughput in ops/sec, so greater is better):

image
image

@worace
Copy link
Owner

worace commented Apr 27, 2020

Thanks for sharing @plokhotnyuk. I had not seen this library before but will check it out. For the benchmarks you are showing in those graphs, are those based on the GeoJson ADT in this library? Or something else? What is the actual script being run there?

Also, does that library expose a Json or JsonObject type that can be used to store data outside of just serializing and deserializing?

@plokhotnyuk
Copy link
Author

plokhotnyuk commented Apr 28, 2020

Benchmarks use own simplified version of GeoJSON, limited to 2D and with Map[String, String] representation for all feature properties. Just enough to (de)serialize a sample and be not overwhelmed by peculiarities of different parsers.

It can be cloned and ran by the following commands:

git clone [email protected]:plokhotnyuk/jsoniter-scala.git --branch master --single-branch
sbt 'jsoniter-scala-benchmark/jmh:run GeoJSON'

The proposed library doesn't provide ADT types but a custom codec can be created. Here are a couple of examples:

Scala's maps and BigDecimal are hard to be used safely, so better option would be GeoJSON ADT parametrized by user-defined types for feature bags if they are known upfront.

Moreover, any maps are much less efficient than case classes for parsing, serialization, and accessing to values.

Also, most of built-in codecs for recursive ADTs are vulnerable under malicious input like long sequences of empty or nested brackets: [{},{},{},{},{},...], [[[[[[[[...]]]]]]], etc.

Currently used Option type is not specialized for primitives and allows mixing of 2D and 3D coordinates in lines and polygons. But the type parametrization allows specialization of GeoJSON ADT for other parts like coordinates (2D, 3D, etc.) and bounding boxes that provides better type safety and performance.

Lets see how easily users can specify a custom type and derive a safe and efficient codec for it:

type MyGeoJSON = GeoJSON[Coord3D, BBox3D, MyFeatureProperties]
val myCodec = JsonCodecMaker.make[MyGeoJSON](CodecMakerConfig)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants