-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ParallelBlockCompressedOutputStream #3
base: master
Are you sure you want to change the base?
Conversation
a40a738
to
c4c6ec7
Compare
protected File file = null; | ||
|
||
// Really a local variable, but allocate once to reduce GC burden. | ||
protected final byte[] singleByteArray = new byte[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't do such micro optimisations. Java HotSpot compiler has a lot of facilities to optimise your code during the compilations and with a very high probability it will remove all the things related to the casting to local array.
* Prepare to compress at the given compression level | ||
* @param file file to output | ||
*/ | ||
public AbstractBlockCompressedOutputStream(final File file) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small code style thing. If you change the name of "file" parameter to "f" or something like that you can avoid usage of "this" in the constructor.
codec = new BinaryCodec(file, true); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No Java doc
} | ||
|
||
/** | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No method description in java doc
* @return size of gzip block that was written. | ||
*/ | ||
protected int writeGzipBlock(final byte[] compressedBuffer, final int compressedSize, final int uncompressedSize, final long crc) { | ||
// Init gzip header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make a static import of all the members of BlockCompressedStreamConstants?
} | ||
|
||
private CompressedBlock compressBlock(UncompressedBlock uncompressedBlock) { | ||
Deflater noCompressionDeflater = deflaterFactory.makeDeflater(Deflater.NO_COMPRESSION, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All local variables in this method could be final. Also, there is no reason to initialise the noCompressionDeflater
in here. We can initialise it only if we need to fall back to it.
return t; | ||
} | ||
); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that I understand the intention of the introduction the following constant.
2f9fff3
to
d3101b4
Compare
d3101b4
to
b9ec2b4
Compare
#We would like to represent new multithreading implementation of BlockCompressedOutptuStream.
ParallelBlockCompressedOutptuStream
provides parallel zipping of GZ-blocks, which leads to performance gain by utilizing CPU cores.The extracting of base
AbstractBlockCompressedOutptuStream
class was provided.AbstractBlockCompressedOutptuStream
is extended by singlethreadBlockCompressedOutptuStream
and byParallelBlockCompressedOutptuStream
implementation.ParallelBlockCompressedOutptuStream
implementsdeflateBlock
method, which is called at the moment the buffer is full and GZ-block should be compressed and be written. The ParallelBCOSdeflateBlock
implementation submit the task of zipping the GZ-block to the ThreadPoolExecutor, so it will be processed in another thread in parallel. The number of threads in ThreadPoolExecutor (number of blocks are processed in parallel) could be controlled by setting-Dsamjdk.zip_threads
property. If the property is equal to 0 (by default), single thread implementation will be used.After enough (64 * ZIP_THREADS) deflating tasks are submitted, the writing task will be submitted. Writing task will join all previous deflating tasks and write them in the original order.
Here are benchmarks for comparing performance results of BlockCompressedOutputStream and ParallelBlockCompressedOutputStream:
We just generate random block of data and write it to the output stream.
Here are also results of second part of SortSam where BlockCompressedOutputStream is used: