Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParallelBlockCompressedOutputStream #3

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

SilinPavel
Copy link
Member

@SilinPavel SilinPavel commented Jun 15, 2017

#We would like to represent new multithreading implementation of BlockCompressedOutptuStream.
ParallelBlockCompressedOutptuStream provides parallel zipping of GZ-blocks, which leads to performance gain by utilizing CPU cores.
The extracting of base AbstractBlockCompressedOutptuStream class was provided. AbstractBlockCompressedOutptuStream is extended by singlethread BlockCompressedOutptuStream and by ParallelBlockCompressedOutptuStream implementation.

uml classes

ParallelBlockCompressedOutptuStream implements deflateBlock method, which is called at the moment the buffer is full and GZ-block should be compressed and be written. The ParallelBCOS deflateBlock implementation submit the task of zipping the GZ-block to the ThreadPoolExecutor, so it will be processed in another thread in parallel. The number of threads in ThreadPoolExecutor (number of blocks are processed in parallel) could be controlled by setting -Dsamjdk.zip_threads property. If the property is equal to 0 (by default), single thread implementation will be used.
After enough (64 * ZIP_THREADS) deflating tasks are submitted, the writing task will be submitted. Writing task will join all previous deflating tasks and write them in the original order.

parallel

Here are benchmarks for comparing performance results of BlockCompressedOutputStream and ParallelBlockCompressedOutputStream:

Benchmark                               (blocks)  Mode  Cnt      Score      Error  Units
BCOSBenchmark.BCOSBenchmark                    2  avgt    5      4.206 ±    0.803  ms/op
BCOSBenchmark.BCOSBenchmark                 1024  avgt    5   1838.131 ±   58.076  ms/op
BCOSBenchmark.BCOSBenchmark                 4096  avgt    5   7791.901 ±  282.288  ms/op
BCOSBenchmark.BCOSBenchmark                16384  avgt    5  29659.399 ±  643.186  ms/op
BCOSBenchmark.parallelBCOSBenchmark4th         2  avgt    5      5.259 ±    6.846  ms/op
BCOSBenchmark.parallelBCOSBenchmark4th      1024  avgt    5    616.716 ±   46.377  ms/op
BCOSBenchmark.parallelBCOSBenchmark4th      4096  avgt    5   3037.837 ±  932.181  ms/op
BCOSBenchmark.parallelBCOSBenchmark4th     16384  avgt    5  10798.186 ± 1611.957  ms/op

We just generate random block of data and write it to the output stream.

Here are also results of second part of SortSam where BlockCompressedOutputStream is used:

master branch realization:
INFO	2017-06-28 13:23:55	SortSam	Wrote    10,000,000 records from a sorting collection.  Elapsed time: 00:20:45s.  Time for last 10,000,000:  118s.  Last read position: 21:24,587,673
INFO	2017-06-28 13:25:49	SortSam	Wrote    20,000,000 records from a sorting collection.  Elapsed time: 00:22:38s.  Time for last 10,000,000:  113s.  Last read position: 7:23,286,339
INFO	2017-06-28 13:27:40	SortSam	Wrote    30,000,000 records from a sorting collection.  Elapsed time: 00:24:30s.  Time for last 10,000,000:  111s.  Last read position: 16:11,025,537
INFO	2017-06-28 13:29:33	SortSam	Wrote    40,000,000 records from a sorting collection.  Elapsed time: 00:26:22s.  Time for last 10,000,000:  112s.  Last read position: 6:143,105,022
INFO	2017-06-28 13:31:27	SortSam	Wrote    50,000,000 records from a sorting collection.  Elapsed time: 00:28:16s.  Time for last 10,000,000:  114s.  Last read position: 1:186,643,572
INFO	2017-06-28 13:33:21	SortSam	Wrote    60,000,000 records from a sorting collection.  Elapsed time: 00:30:10s.  Time for last 10,000,000:  114s.  Last read position: 1:196,714,975
INFO	2017-06-28 13:35:13	SortSam	Wrote    70,000,000 records from a sorting collection.  Elapsed time: 00:32:03s.  Time for last 10,000,000:  112s.  Last read position: 16:49,983,724
INFO	2017-06-28 13:37:06	SortSam	Wrote    80,000,000 records from a sorting collection.  Elapsed time: 00:33:55s.  Time for last 10,000,000:  112s.  Last read position: 1:197,704,968
INFO	2017-06-28 13:38:59	SortSam	Wrote    90,000,000 records from a sorting collection.  Elapsed time: 00:35:48s.  Time for last 10,000,000:  113s.  Last read position: 9:125,905,256
INFO	2017-06-28 13:40:54	SortSam	Wrote   100,000,000 records from a sorting collection.  Elapsed time: 00:37:43s.  Time for last 10,000,000:  114s.  Last read position: 6:140,620,132
INFO	2017-06-28 13:42:46	SortSam	Wrote   110,000,000 records from a sorting collection.  Elapsed time: 00:39:35s.  Time for last 10,000,000:  111s.  Last read position: 9:104,433,180
INFO	2017-06-28 13:44:36	SortSam	Wrote   120,000,000 records from a sorting collection.  Elapsed time: 00:41:25s.  Time for last 10,000,000:  110s.  Last read position: 1:143,767,447
INFO	2017-06-28 13:46:26	SortSam	Wrote   130,000,000 records from a sorting collection.  Elapsed time: 00:43:16s.  Time for last 10,000,000:  110s.  Last read position: 5:168,250,174
INFO	2017-06-28 13:48:20	SortSam	Wrote   140,000,000 records from a sorting collection.  Elapsed time: 00:45:09s.  Time for last 10,000,000:  113s.  Last read position: 17:8,159,993
INFO	2017-06-28 13:50:12	SortSam	Wrote   150,000,000 records from a sorting collection.  Elapsed time: 00:47:01s.  Time for last 10,000,000:  112s.  Last read position: 6:70,942,372
INFO	2017-06-28 13:52:04	SortSam	Wrote   160,000,000 records from a sorting collection.  Elapsed time: 00:48:53s.  Time for last 10,000,000:  112s.  Last read position: 22:50,433,075
INFO	2017-06-28 13:53:56	SortSam	Wrote   170,000,000 records from a sorting collection.  Elapsed time: 00:50:45s.  Time for last 10,000,000:  111s.  Last read position: 6:16,823,069
INFO	2017-06-28 13:55:48	SortSam	Wrote   180,000,000 records from a sorting collection.  Elapsed time: 00:52:38s.  Time for last 10,000,000:  112s.  Last read position: 16:20,636,901
INFO	2017-06-28 13:57:40	SortSam	Wrote   190,000,000 records from a sorting collection.  Elapsed time: 00:54:29s.  Time for last 10,000,000:  111s.  Last read position: 4:180,667,149
INFO	2017-06-28 13:59:33	SortSam	Wrote   200,000,000 records from a sorting collection.  Elapsed time: 00:56:22s.  Time for last 10,000,000:  112s.  Last read position: X:7,434,779
INFO	2017-06-28 14:01:25	SortSam	Wrote   210,000,000 records from a sorting collection.  Elapsed time: 00:58:14s.  Time for last 10,000,000:  112s.  Last read position: 2:60,740,053

With 2 threads for zipping:
INFO	2017-06-28 14:21:05	SortSam	Wrote    10,000,000 records from a sorting collection.  Elapsed time: 00:19:10s.  Time for last 10,000,000:   76s.  Last read position: 21:24,587,673
INFO	2017-06-28 14:22:18	SortSam	Wrote    20,000,000 records from a sorting collection.  Elapsed time: 00:20:23s.  Time for last 10,000,000:   73s.  Last read position: 7:23,286,339
INFO	2017-06-28 14:23:32	SortSam	Wrote    30,000,000 records from a sorting collection.  Elapsed time: 00:21:36s.  Time for last 10,000,000:   73s.  Last read position: 16:11,025,537
INFO	2017-06-28 14:24:45	SortSam	Wrote    40,000,000 records from a sorting collection.  Elapsed time: 00:22:50s.  Time for last 10,000,000:   73s.  Last read position: 6:143,105,022
INFO	2017-06-28 14:25:59	SortSam	Wrote    50,000,000 records from a sorting collection.  Elapsed time: 00:24:03s.  Time for last 10,000,000:   73s.  Last read position: 1:186,643,572
INFO	2017-06-28 14:27:08	SortSam	Wrote    60,000,000 records from a sorting collection.  Elapsed time: 00:25:12s.  Time for last 10,000,000:   69s.  Last read position: 1:196,714,975
INFO	2017-06-28 14:28:21	SortSam	Wrote    70,000,000 records from a sorting collection.  Elapsed time: 00:26:25s.  Time for last 10,000,000:   73s.  Last read position: 16:49,983,724
INFO	2017-06-28 14:29:35	SortSam	Wrote    80,000,000 records from a sorting collection.  Elapsed time: 00:27:40s.  Time for last 10,000,000:   74s.  Last read position: 1:197,704,968
INFO	2017-06-28 14:30:47	SortSam	Wrote    90,000,000 records from a sorting collection.  Elapsed time: 00:28:52s.  Time for last 10,000,000:   71s.  Last read position: 9:125,905,256
INFO	2017-06-28 14:32:01	SortSam	Wrote   100,000,000 records from a sorting collection.  Elapsed time: 00:30:06s.  Time for last 10,000,000:   74s.  Last read position: 6:140,620,132
INFO	2017-06-28 14:33:15	SortSam	Wrote   110,000,000 records from a sorting collection.  Elapsed time: 00:31:20s.  Time for last 10,000,000:   74s.  Last read position: 9:104,433,180
INFO	2017-06-28 14:34:29	SortSam	Wrote   120,000,000 records from a sorting collection.  Elapsed time: 00:32:34s.  Time for last 10,000,000:   73s.  Last read position: 1:143,767,447
INFO	2017-06-28 14:35:42	SortSam	Wrote   130,000,000 records from a sorting collection.  Elapsed time: 00:33:47s.  Time for last 10,000,000:   72s.  Last read position: 5:168,250,174
INFO	2017-06-28 14:36:56	SortSam	Wrote   140,000,000 records from a sorting collection.  Elapsed time: 00:35:01s.  Time for last 10,000,000:   74s.  Last read position: 17:8,159,993
INFO	2017-06-28 14:38:16	SortSam	Wrote   150,000,000 records from a sorting collection.  Elapsed time: 00:36:21s.  Time for last 10,000,000:   79s.  Last read position: 6:70,942,372
INFO	2017-06-28 14:39:35	SortSam	Wrote   160,000,000 records from a sorting collection.  Elapsed time: 00:37:40s.  Time for last 10,000,000:   78s.  Last read position: 22:50,433,075
INFO	2017-06-28 14:40:48	SortSam	Wrote   170,000,000 records from a sorting collection.  Elapsed time: 00:38:53s.  Time for last 10,000,000:   73s.  Last read position: 6:16,823,069
INFO	2017-06-28 14:42:02	SortSam	Wrote   180,000,000 records from a sorting collection.  Elapsed time: 00:40:06s.  Time for last 10,000,000:   73s.  Last read position: 16:20,636,901
INFO	2017-06-28 14:43:18	SortSam	Wrote   190,000,000 records from a sorting collection.  Elapsed time: 00:41:22s.  Time for last 10,000,000:   76s.  Last read position: 4:180,667,149
INFO	2017-06-28 14:44:34	SortSam	Wrote   200,000,000 records from a sorting collection.  Elapsed time: 00:42:39s.  Time for last 10,000,000:   76s.  Last read position: X:7,434,779
INFO	2017-06-28 14:45:56	SortSam	Wrote   210,000,000 records from a sorting collection.  Elapsed time: 00:44:01s.  Time for last 10,000,000:   81s.  Last read position: 2:60,740,053

@SilinPavel SilinPavel force-pushed the epam-ls_ParallelBlockCompressedOutputStream branch from a40a738 to c4c6ec7 Compare June 22, 2017 13:37
protected File file = null;

// Really a local variable, but allocate once to reduce GC burden.
protected final byte[] singleByteArray = new byte[1];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do such micro optimisations. Java HotSpot compiler has a lot of facilities to optimise your code during the compilations and with a very high probability it will remove all the things related to the casting to local array.

* Prepare to compress at the given compression level
* @param file file to output
*/
public AbstractBlockCompressedOutputStream(final File file) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small code style thing. If you change the name of "file" parameter to "f" or something like that you can avoid usage of "this" in the constructor.

codec = new BinaryCodec(file, true);
}


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Java doc

}

/**
*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No method description in java doc

* @return size of gzip block that was written.
*/
protected int writeGzipBlock(final byte[] compressedBuffer, final int compressedSize, final int uncompressedSize, final long crc) {
// Init gzip header

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a static import of all the members of BlockCompressedStreamConstants?

}

private CompressedBlock compressBlock(UncompressedBlock uncompressedBlock) {
Deflater noCompressionDeflater = deflaterFactory.makeDeflater(Deflater.NO_COMPRESSION, true);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All local variables in this method could be final. Also, there is no reason to initialise the noCompressionDeflater in here. We can initialise it only if we need to fall back to it.

return t;
}
);

Copy link

@yury-melnikov-epam yury-melnikov-epam Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I understand the intention of the introduction the following constant.

@SilinPavel SilinPavel force-pushed the epam-ls_ParallelBlockCompressedOutputStream branch 2 times, most recently from 2f9fff3 to d3101b4 Compare June 29, 2017 10:08
@SilinPavel SilinPavel force-pushed the epam-ls_ParallelBlockCompressedOutputStream branch from d3101b4 to b9ec2b4 Compare June 29, 2017 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants