-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
License, Windows and Compatibility Upgrade #18
Conversation
crc32.c
Outdated
#ifdef HAS_PCLMUL | ||
|
||
//https://github.com/webosose/chromium68/blob/master/src/third_party/zlib/crc32_simd.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of pointing to an outdated copy of Chromium (M68), you could reference the ToT (Top of Tree) original repository Chromium repo*?
Since M68, I've added some new optimizations, refactorings and bug fixes on Chromium's zlib.
:-)
@@ -21,6 +21,13 @@ | |||
DYNAMIC_CRC_TABLE and MAKECRCH can be #defined to write out crc32.h. | |||
*/ | |||
|
|||
#ifdef HAS_PCLMUL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you try to follow the integration method used in Chromium's zlib i.e. keeping SIMD/optimizations in separated files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Adenilson I have done so in my fork. Can I suggest I withdraw this pull request, and you either comment on my fork or make your own fork. Once all your concerns are addressed, we can generate a new pull request to the CloudFlare repo. I have tested the separate files on both MacOS and Linux, so I think this addresses your concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Adenilson I should also point out that my repo copies the Chromium zlib code verbatim, and still includes the ARM code, even though I only tested and included the x86 code. I think the whole community would benefit if the ARM code was enhanced, but that is outside my wheelhouse.
crc32.c
Outdated
#ifdef HAS_PCLMUL | ||
#include <emmintrin.h> | ||
#include <smmintrin.h> | ||
#include <wmmintrin.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all of those can be replaced by a single #include <immintrin.h>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defines no longer in crc32, @Adenilson can decide if these can be replaced in arc_simd.c
crc32.c
Outdated
#ifdef _MSC_VER | ||
#define zalign(x) __declspec(align(x)) | ||
#else | ||
#define zalign(x) __attribute__((aligned((x)))) | ||
#endif | ||
|
||
uLong crc32_simd(unsigned char const *buf, size_t len, uInt crc) { | ||
uint crc32_simd(unsigned char const *buf, size_t len, uInt crc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather use uint32_t
everywhere the size is known to be 32 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed (rordenlab@59c0c27)
#define PCLMUL_MIN_LEN 64 | ||
#define PCLMUL_ALIGN 16 | ||
#define PCLMUL_ALIGN_MASK 15 | ||
|
||
int cpu_has_pclmul = -1; //global: will be 0 or 1 after first test | ||
|
||
int has_pclmul(void) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this perhaps be called at init time with __attribute__((constructor))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkrasnov I think my solution is robust and efficient and works on all the compilers I tested it on (gcc 4.8..9.2; Clang 10, Microsoft C compiler). No other location in this library uses the __attribute__((constructor))
style, and it is unclear if it is universally accepted. This is your code, which you maintain, so ultimately should be familiar to you. Since that style is unfamiliar to me, I would be grateful if you or @Adenilson could devise the implementation you are happy with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about __attribute__((constructor))
, e.g. Rust intentionally avoids life before main
, so I'm not sure how that would work if this library was wrapped as a Rust crate.
However, unsychronized access to a global variable is Undefined Behavior if code runs on multiple threads, and we definitely will use it in multi-threaded programs. So at very least it should be marked as atomic. I know x86 memory model is forgiving about such things, but things that are UB in the compiler, but happen to work in practice are a thin ice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kornelski -I have tried to address this in my fork. The value is atomic. There is also no a compiler flag SKIP_CPUID_CHECK
that allows one to assume that the CPU supports PCLMUL. One option would be to make the default behavior be to assume PCLMUL support. This would my commit would have the same compatibility as the current code, simply using the Chromium BSD-licensed SIMD CRC in place of the GPL code. This would eliminate @Adenilson's performance concerns, albeit just like the current code it will fail if the code is compiled on a system that supports PCLMUL but is run on a computer without support.
@neurolabusc A question: how are you benchmarking? And with which data corpus? |
@Adenilson here is my benchmark for zlib and pigz. I explicitly test NIfTI-format images that are popular in my field. Tools like AFNI and FSL will save gzip-compressed images at each stage of processing, and individual files are often ~2Gb each (e.g. multi-band fMRI session). I understand that @vkrasnov's original use case was a huge number of very small files. In my case, the penalty for a single cpuid call is negligible, while it may be more of an issue for other cases. One option is to simply have a compile flag to determine if the cpuid should be checked. People like @vkrasnov who build for modern facilities can compile to without supporting older processors (just like previous versions of this repository), while scientists may want to compile a version that does not crash if run on old hardware. I am open for any suggestions. |
@Adenilson are you interested in implementing your changes to the CloudFlare zlib? Your changes cover the same ground as mine (remove GPL, aid MSVC updates), but is superior to mine with ARM support. Also, you provide a CPU detection that you suggest might be superior to mine. I am reluctant to put your changes into my fork, as I am not familiar with ARM. If you will devote time to this, I will withdraw my PR. Once your PR is included, I can work to ensure the cmake supports MSVC. On the other hand, if you are not interested in this, I would like to go ahead and submit a revised version of my own CloudFlare pull request. While it does not include all the features of yours, it is still superior in every respect to the current CloudFlare implementation. Just tell me how to proceed. I can completely understand if you do not want to devote time to this. From my perspective, CloudFlare zlib includes benefits beyond the enhanced CRC discussed here, both for serial as well as parallel compression. |
@neurolabusc I'm afraid I lack the bandwidth to dedicate to upstreaming changes to Cloudflare fork. Maintaining the zlib used by Chrome and Android and trying to upstream changes to Canonical zlib is already a bit taxing. |
* Replace GPL CRC with BSD CRC (zlib-ng/zlib-ng#42), for validation see https://github.com/neurolabusc/simd_crc * Westmere detection * Update configure for Westmere (InsightSoftwareConsortium/ITK#416) * use cpu_has_pclmul() to autodetect CPU hardware (InsightSoftwareConsortium/ITK#416) * remove gpl code * Improve support for compiling using Windows (https://github.com/ningfei/zlib) * Import ucm.cmake from https://github.com/ningfei/zlib * crc32_simd as separate file (#18) * atomic and SKIP_CPUID_CHECK (#18) * Remove unused code * Only allow compiler to use clmul instructions to crc_simd unit (intel/zlib#25) * Atomic does not compile on Ubuntu 14.04 * Allow "cmake -DSKIP_CPUID_CHECK=ON .." to not check SIMD CRC support on execution. * Restore Windows compilation using "nmake -f win32\Makefile.msc" * zconf.h required for Windows nmake * support crc intrinsics for ARM CPUs * Requested changes (#19)
* Fix architecture macro on MSVC. Signed-off-by: Ningfei Li <[email protected]> * Update '.gitignore'. Also remove 'zconf.h', it will be generated during building. * Add 'build' subdir to .gitignore * Fix architecture macro on MSVC. * Update CMakeLists.txt Clean up. Add option to set runtime (useful for MSVC). Add 'SSE4.2' and 'AVX' option. Signed-off-by: Ningfei Li <[email protected]> * Define '_LARGEFILE64_SOURCE' by default in CMakeLists.txt. Also remove checking fseeko. Signed-off-by: Ningfei Li <[email protected]> * Set 'CMAKE_MACOSX_RPATH' to TRUE. Signed-off-by: Ningfei Li <[email protected]> * Update SSE4.2 and PCLMUL setting in CMakeLists.txt Signed-off-by: Ningfei Li <[email protected]> * Add 'HAVE_HIDDEN' to CMakeLists.txt Signed-off-by: Ningfei Li <[email protected]> * Fix building dll on MSVC. Signed-off-by: Ningfei Li <[email protected]> * Fix zlib.pc file generation. Signed-off-by: Ningfei Li <[email protected]> * Refine cmake settings. * Change cmake function to lowercase. * Fix zlib.pc * Fix crc32_pclmul_le_16 type. * Set POSITION_INDEPENDENT_CODE flag to ON by default. * Replace GPL CRC with BSD CRC (zlib-ng/zlib-ng#42), for validation see https://github.com/neurolabusc/simd_crc * Westmere detection * Update configure for Westmere (InsightSoftwareConsortium/ITK#416) * use cpu_has_pclmul() to autodetect CPU hardware (InsightSoftwareConsortium/ITK#416) * remove gpl code * Improve support for compiling using Windows (https://github.com/ningfei/zlib) * Import ucm.cmake from https://github.com/ningfei/zlib * crc32_simd as separate file (#18) * atomic and SKIP_CPUID_CHECK (#18) * Suppress some MSVC warnings. * Remove unused code * Fix ucm_set_runtime when only C is enabled for the project. * Removed configured header file. * Unify zconf.h template. * Only allow compiler to use clmul instructions to crc_simd unit (intel/zlib#25) * Atomic does not compile on Ubuntu 14.04 * Allow "cmake -DSKIP_CPUID_CHECK=ON .." to not check SIMD CRC support on execution. * Restore Windows compilation using "nmake -f win32\Makefile.msc" * Do not set SOVERSION on Cygwin. * Fix file permission. * Refine PCLMUL CMake option. * zconf.h required for Windows nmake * Do not set visibility flag on Cygwin. * Fix compiling dll resource. * Fix compiling using MinGW. SSE4.2 and PCLMUL are also supported. * Fix zconf.h for Windows. * support crc intrinsics for ARM CPUs * Add gzip -k option to minigzip (to aid benchmarks) * Support Intel and ARMv8 optimization in CMake. * Clean up. * Fix compiling on Apple M1. check_c_compiler_flag detects SSE2, SSE3, SSE42 and PCLMUL but compiling fails. Workaround fix, need further investigation. * Restore MSVC warnings. Co-authored-by: neurolabusc <[email protected]> Co-authored-by: Chris Rorden <[email protected]>
This pull request adds 3 benefits:
A benchmark demonstrates that this code is robust and outperforms the baseline and zlib-ng on a wide range of hardware and operating systems.
I would be excited to see this pull request included, as it can be called from the pigz make. This dramatically accelerates pigz, with the SIMD CRC directly accelerating the serial stage that is most impacted by Amdahl's law.