Benchmark comparing DOM load and path extractor #8

raganhan · 2018-10-31T20:00:33Z

See README.md changes for description of benchmarks

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

tgregg · 2018-10-31T20:11:44Z

README.md

+more details.
+
+To execute the benchmarks run: `gradle --no-daemon jmh`, requires an internet connection as it downloads the data set. 
+Results bellow, higher is better. 


bellow -> below

tgregg · 2018-10-31T20:16:29Z

src/jmh/java/software/amazon/ionpathextraction/benchmarks/PathExtractorBenchmark.java

+            final IonReader reader = newReader(inputStream);
+            final IonWriter writer = newBinaryWriter(binaryOut)
+        ) {
+            // all data is in the `dataset` key as a list, only write out that field to keep the extractor smaller


Can you elaborate on this? It looks like the binary and text data has different structure?

The text data is created by writing the binary data as Ion text so they are the same, see line 89.

Both are different than the original dataset. The original dataset is something like:

{ <some meta field>: <value>, <other meta field>: <value>, "dataset": [ <item1>, <item2> ] }

the bulk of the data is in the dataset struct field, that code is only picking the contents of the dataset and writing them out, e.g. <item1><item2>.

I'm doing this to make the search paths less verbose but not really work it if it caused confusion. Will change the benchmark to work with the data as is.

The binary and text are the same as bytesText is created by writing out bytesBinary as IonText.

I'm extracting the dataset field from the original dataset for the benchmark test data as the bulk of the data is there, the rest is just some metadata. But thinking more on it it's probably not worth it as can cause confusion, better to have the benchmark to work on the original dataset, will change that

Ah, I see that now, thanks. I'm fine with it as-is.

tgregg · 2018-10-31T20:46:13Z

src/jmh/java/software/amazon/ionpathextraction/benchmarks/PathExtractorBenchmark.java

+/**
+ * Benchmarks comparing the PathExtractor with fully materializing the DOM.
+ */
+public class PathExtractorBenchmark {


It would be nice to make this pluggable for different data sets as well, since the performance of the path extractor is highly dependent on the characteristics of the data. It would be good to eventually provide results for several data sets along with a description of the data and what was skipped. This doesn't block the initial release.

Agree, opened #9 to track this

tgregg · 2018-10-31T20:46:48Z

README.md

+```
+
+Using the path extractor has equivalent performance for both text and binary when fully materializing the document and 
+can give significant performance improvements when partially materializing binary documents. This happens due to Ion's 


Benchmark comparing DOM load and path extractor

4fe97aa

raganhan requested review from zslayton and tgregg October 31, 2018 20:00

tgregg reviewed Oct 31, 2018

View reviewed changes

raganhan mentioned this pull request Oct 31, 2018

Make included benchmark pluggable for different dataset #9

Open

tgregg approved these changes Oct 31, 2018

View reviewed changes

fixing typo in Readme.md

c2cf541

raganhan merged commit 36adff2 into master Oct 31, 2018

raganhan deleted the benchmarks-6 branch October 31, 2018 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark comparing DOM load and path extractor #8

Benchmark comparing DOM load and path extractor #8

raganhan commented Oct 31, 2018

tgregg Oct 31, 2018

tgregg Oct 31, 2018

raganhan Oct 31, 2018

raganhan Oct 31, 2018

tgregg Oct 31, 2018

tgregg Oct 31, 2018

raganhan Oct 31, 2018

tgregg Oct 31, 2018

Benchmark comparing DOM load and path extractor #8

Benchmark comparing DOM load and path extractor #8

Conversation

raganhan commented Oct 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment