Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Consider the work of the standard implementation of SortingCollection:
We present a multithreaded implementation SortingCollection.
The main idea of this improvements is that we can sort and spill data, stored in the sorting collection, asynchronously by submitting task to executor service.
In comparison with the standard version it looks like:
As we can see above our version allow not to stop work with the collection during writing into temporary file.
Method doneAdding() is blocking, we have to wait until all spill tasks will be done.
Please take a look at our benchmarks:
There is the benchmark for SortingCollection. It is based on the random generation of data for SortingCollection, we just create random int, transform it to string and put into the sorting collection.
Results are provided below.
There is log output of SortSam metric which used different types of SortingCollection.
AsyncSpillSortingCollection with 2 spilling threads
Standard SortingCollection
This improvement is enabled by setting JVM option -Dsamjdk.sort_col_threads=number_of_threads, if number_of_threads > 0, all utility that use SortingCollection will use its multithreaded version.