You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 23, 2023. It is now read-only.
Both can found in compress package. CompressorSequential has been written for optimal performance in a single thread. Compressor (formerly known as ParallelCompressor) is used for prototypes and experiments and is therefore aiming at utilising maximum resources to run prototypes faster.
But maintaining two variants of the same code is error prone. Aggregator (part of Erigon2 prototype) has been switched to Compressor (parallel compressor) and now it is runs slower. My suspicion is that parallel compressor is wasting a lot of time on dispatching work, scheduling and on extra memory allocations to make sure thread-safely. We would like to profile those areas and optimise them.
For more context, in production, it is likely we will run compressor in a SINGLE background thread. So it may not even need to spawn goroutines in that mode. Parallel mode would only be used for experiments and prototypes.
Beyond Erigon2 prototype, compressor is currently used to package block header and block body snapshots. Requirement there (as well as in Erigon2 prototype) that optimisations do not change the resulting compressed file. Also, regardless of number of workers, the resulting compressed file should be the same.
However, if we find an optimisation that requires change of the file format, we will definitely consider it!
The text was updated successfully, but these errors were encountered:
Added creation of superstrings immediately - instead of writing to file first: by #284 . We still need to create uncompressedFile file - because we need read data twice (for reducedict). Sequential compresser also doing it. Need to add here same trick as in ETL - create uncompressedFile only when it > etl.BufferOptimalSize.
Performance issue still exists (I mean this issue is valid).
Both can found in
compress
package.CompressorSequential
has been written for optimal performance in a single thread. Compressor (formerly known as ParallelCompressor) is used for prototypes and experiments and is therefore aiming at utilising maximum resources to run prototypes faster.But maintaining two variants of the same code is error prone. Aggregator (part of Erigon2 prototype) has been switched to
Compressor
(parallel compressor) and now it is runs slower. My suspicion is that parallel compressor is wasting a lot of time on dispatching work, scheduling and on extra memory allocations to make sure thread-safely. We would like to profile those areas and optimise them.For more context, in production, it is likely we will run compressor in a SINGLE background thread. So it may not even need to spawn goroutines in that mode. Parallel mode would only be used for experiments and prototypes.
Beyond Erigon2 prototype, compressor is currently used to package block header and block body snapshots. Requirement there (as well as in Erigon2 prototype) that optimisations do not change the resulting compressed file. Also, regardless of number of workers, the resulting compressed file should be the same.
However, if we find an optimisation that requires change of the file format, we will definitely consider it!
The text was updated successfully, but these errors were encountered: