-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I debug performance of deflate in libewf (ewfacquirestream)? #178
Comments
I'm not able to tell, too little information about your configuration, build and the test parameters. The code is considered experimental, you could use normal code profiling tools to see what is going on. |
I understand that non-legacy version is experimental. I tried this version because non-legacy may not be parallelized based on my understanding of the documentation, but as outlined above the performance is about the same. I do not have experience with code profiling for performance aspects, but I am happy to provide the information you need. I assume the most helpful information is the ./configure summary. May it be useful to use libcthreads instead of pthreads?: `Building: Features: |
This should not be the case, but you might be hitting a yet unimplemented use case. Can you describe step by step:
Libcthreads is a cross-platform wrapper around different thread implementations it uses the pthread (without s) implementation under the hood on some configurations. |
legacy_config.log
pbzip2.exe -d --stdout /cygdrive/d/testimage.bz2 | ~/libewf-legacy-main/ewftools/ewfacquirestream -c deflate:best -S 25G -l /cygdrive/k/libewf-legacy.log -d sha256 -t /cygdrive/k/image-legacy (pbzip2 is not the bottleneck. If I use compression fast for example, the throughput is much higher; for non-legacy i used the same parameters but added -j for testing as written in previous post.; d and k are different drives and hardly utilized, so not the bottleneck either.)
Thanks |
20140815 is the legacy version, was more interested in the output of the experimental version. I'll see if I can reproduce this scenario when time permits |
That is correct. As described above, it does not really make a difference in the output and speed (besides the version number) and I did not want to SPAM. Reproducing it however, I did find out that -j 50 defaults to -j 4 -- I did not notice that before. More than 32 jobs default to 4, so the effective maximum I tried was -12. (Wouldn't it be useful to default to max instead of default if the given parameter is > max?) In the current test, I observed that legacy has 8-10% CPU util. and non-legacy seems to have slightly more (11-13%) Here is the start of output for -j 32 with non-legacy
|
Hey there,
I just built the current version and I notice that compression speed is pretty slow with best (deflate) compression.
It starts at about 10MBs and after 10GB it is at about 35-40 MBs. It does not seem to matter whether I use -j 1, -j 12 or -j 50 -- ewfacquire uses about 8-10% of 6x2 cores only. The legacy-build achieves the same performance.
I do expect low performance for best compression, of course. However CPU is hardly utilized and so are source and destination drives. If the compression is parallelized (at least for the non-legacy version - why would there be a -j parameter otherwise), what is the bottleneck and can I do something about it using different configuration parameters?
Thx
The text was updated successfully, but these errors were encountered: