Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blosc/CMakeLists.txt: Update Lz4 handling #386

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

johnwparent
Copy link

@johnwparent johnwparent commented Oct 10, 2024

Lz4 is capable of vendoring its own CMake config module, which is considered preferable over vendored find modules according to CMake best practices.

This patch adds support for importing an LZ4 install detected via its CMake Config module by searching for the targets imported via that module. It then updates the link interface for cblosc static and shared variants to link to their respective LZ4 libraries (or whichever is available).

Lz4 is capable of vendoring its own CMake config
module, which is considerd preferable over vendored
find modules according to CMake best practices.

This patch adds support for importing an LZ4 install
detected via its CMake Config module by searching
for the targets imported via that module.
It then updates the link interface for cblosc
static and shared variants to link to their
respective LZ4 libraries (or whichevere is available)
set(LIBS ${LIBS} ${STATIC_LIBS})
endif()
elseif(NOT SHARED_LIBS AND NOT STATIC_LIBS)
# Fallback to cblosc vendored find module
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the vendored case handled below, after else(LZ4_FOUND)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the case where you vendor LZ4 in its entirety, this handles the case where the c-blosc vendored find module for LZ4 is used instead of LZ4's config module.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different components being vendored, I had hoped the "vendored find module" would be clear enough, but evidently it was not, my apologies.

Copy link
Contributor

@kalvdans kalvdans Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with libraries providing cmake snippets, i only heard about pkg-config. I'll let someone more familiar with Cmake review. To me, the "genex" code is unreadable and gives vibes of the xz backdoor...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake's Config modules are an integral part of a healthy CMake ecosystem, it allows you to export your project in a way that allows consumers to use your project exactly as you intend.

The doc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did just notice a mistake in the generator expressions, but that doesn't mean they're a vulnerability...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is now fixed, forgot to refactor the code from my test project.

@FrancescAlted
Copy link
Member

Thanks @johnwparent . This contribution seems legitimate to me, but it is going well beyond my knowledge (CMake requires a lot of attention to understand its full capabilities). Besides, I don't fully understand the purpose of this one. But if @kalvdans is fine with it, I'm +1 on merging this.

@johnwparent
Copy link
Author

This contribution seems legitimate to me
Thanks!

Besides, I don't fully understand the purpose of this one.

c-blosc has its own find module for the purposes to detecting lz4 on the filesystem, and then incorporating the lz4 artifacts into its own CMake. This works (clearly), however CMake best practices suggest that whenever possible, a config module should be preferred. In general, it is typically better to let a project define its own export interface and subsequent usage requirements.

Lz4 support producing this config module, so it would be ideal to allow c-blosc to utilize it, however because c-blosc's CMake is setup in a way that it expects the results of find_package(lz4) to have been the result of its vendored "find module", it is unable to actually use the results of lz4's config module appropriately, which results in the call to find_package succeeding, and c-blosc attempting to build, and then failing when it cannot find lz4's headers or libraries.

In short, the purpose is to enable compatibility with lz4's config module, while still maintaining the ability to use the vendored find module.

Basically the CMake here checks for the target that would be imported by Lz4's config module, and if it's available (i.e. find_package found the config module and used that to provide lz4 instead of c-blosc's find module), integrates those targets into the existing build logic.
Lz4 can build both shared and static at once (much like c-blosc) so much of the logic here is simply detecting which type of Lz4's binaries is available (if not both) and incorporating correctly into the c-blosc compile and link interface.

The generator expressions look complex, but perform a very simple task. They check for the type of the library the expression is being evaluated in the context of (in this case, either cblosc_shared or cblosc_static), and yield the correct type of lz4 binary depending on the type of the c-blosc library.

@kalvdans
Copy link
Contributor

while still maintaining the ability to use the vendored find module.

Will the code gets smaller if we drop support for the old way of finding the module?

@johnwparent
Copy link
Author

Will the code gets smaller if we drop support for the old way of finding the module?

Technically yes, but you'll lose that functionality, which is still fairly relevant. LZ4 can be built with makefiles or by CMake, so while the CMake derived builds of lz4 can work with this config approach, the makefile based builds cannot. I'd recommend keeping the old find module until Lz4 drops their makefile support.

@kalvdans
Copy link
Contributor

How do I test this? I installed liblz4-dev (which on Debian doesn't seem to ship with any cmake config module) and it was detected when I passed cmake -D PREFER_EXTERNAL_LZ4=ON.

Could you @johnwparent please give me some instructions of 1) a dev package that contains the cmake config module, and 2) cmake options that will build a static blosc and 3) dynamically linked blosc.

Sorry to ask so much, given your very long description above, but since we don't have any other cmake expert around you'll have to spoonfeed me...

@johnwparent
Copy link
Author

How do I test this? I installed liblz4-dev (which on Debian doesn't seem to ship with any cmake config module) and it was detected when I passed cmake -D PREFER_EXTERNAL_LZ4=ON.

That's the case that necessitates preserving the find module mechanism in the codebase, as you asked about here.

Could you @johnwparent please give me some instructions of 1) a dev package that contains the cmake config module, and 2) cmake options that will build a static blosc and 3) dynamically linked blosc.

Happy to. I'm actually making this contribution on behalf of the Spack package manager, so Spack installing lz4 is one option. You can also build lz4 from source via it's CMake build system following the instructions in that repo (the only thing you really need to specify on the command line is the preference for shared, static, or both types of artifacts) and install it. Both options will provide an lz4 w/ CMake config to test with.

So for an lz4 from source build your command line would look something like:
cmake .. -DBUILD_SHARED_LIBS=ON -DBUILD_STATIC_LIBS=ON -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_POLICY_DEFAULT_CMP0042=NEW

This is the line I use to build Lz4.

For c-blosc:

The -DBUILD_STATIC=ON will build a static blosc and -DBUILD_SHARED=ON will build a shared blosc.
The full commandline I use to build blosc is:
cmake .. -DDEACTIVATE_LZ4=FALSE -DPREFER_EXTERNAL_LZ4=ON

Specifying no preference for static vs shared when giving CMake arguments to blosc will build both at once.

Sorry to ask so much, given your very long description above, but since we don't have any other cmake expert around you'll have to spoonfeed me...

No worries, happy to do what needs to be done to help this land!

@kalvdans
Copy link
Contributor

so Spack installing lz4 is one option.

I gave it a shot but can't see any CMake module among the installed files. Maybe I did something wrong:

$ git clone -c feature.manyFiles=true --depth=2 https://github.com/spack/spack.git
$ cd spack
$ bin/spack install lz4
[...]
==> lz4: Successfully installed lz4-1.10.0-d6nsrrrstkhcgr4s7zmtw5m7q6zaqlvs
  Stage: 1.47s.  Edit: 0.00s.  Build: 15.62s.  Install: 0.30s.  Post-install: 0.03s.  Total: 17.61s
[+] /home/chn/repo/spack/opt/spack/linux-ubuntu24.04-skylake/gcc-13.2.0/lz4-1.10.0-d6nsrrrstkhcgr4s7zmtw5m7q6zaqlvs
$ (cd /home/chn/repo/spack/opt/spack/linux-ubuntu24.04-skylake/gcc-13.2.0/lz4-1.10.0-d6nsrrrstkhcgr4s7zmtw5m7q6zaqlvs && ls -R)
.:
bin  include  lib  share

./bin:
lz4  lz4c  lz4cat  unlz4

./include:
lz4file.h  lz4frame.h  lz4frame_static.h  lz4.h  lz4hc.h

./lib:
liblz4.a  liblz4.so  liblz4.so.1  liblz4.so.1.10.0  pkgconfig

./lib/pkgconfig:
liblz4.pc

./share:
man

./share/man:
man1

./share/man/man1:
lz4.1  lz4c.1  lz4cat.1  unlz4.1

@kalvdans
Copy link
Contributor

A little more progress:

  • I've managed to build lz4 from git sources and get the CMake stubs installed under $PREFIX/lib/cmake/lz4/lz4*.cmake
  • The c-blosc vendored find-package is the file cmake/FindLZ4.cmake in this git repo.
  • Removing the vendored find-package didn't immediately work, I had to change to lowercase find_package(lz4) in CMakeLists.txt in this git repo root. Then the lz4 cmake stubs are used.

Still haven't reproduced any build errors caused by header files not found. Also a bit confused why we the need to differentiate lz4_static and lz4_shared as the -llz4 argument to the linker will do the right thing in both cases.

@johnwparent
Copy link
Author

johnwparent commented Oct 16, 2024

I gave it a shot but can't see any CMake module among the installed files. Maybe I did something wrong:

No you did everything correctly (and thanks for trying Spack!). I do Spack's Windows support, so this is my fault for forgetting other platforms build LZ4 differently by default. Spack will prefer building LZ4 with autotools everywhere but Windows, but you can make it use cmake by specifying spack install lz4 build_system=cmake.

That said, installing from source worked, which is great.

  • I've managed to build lz4 from git sources and get the CMake stubs installed under $PREFIX/lib/cmake/lz4/lz4*.cmake

Awesome! Thanks for being so diligent in trying to test my changes!

  • Removing the vendored find-package didn't immediately work, I had to change to lowercase find_package(lz4) in CMakeLists.txt in this git repo root. Then the lz4 cmake stubs are used.

Hmm, the uppercase iteration worked for me with a patched lz4, but that wasn't from main, so perhaps there's a naming change. I'll look into a resolution for that as that will need to be addressed for the changes here to be useful.

Still haven't reproduced any build errors caused by header files not found.

I dug into this a bit more, and the issue I reported initially has two origins, both of which contribute. The first issue is LZ4's CMake config did not actually export their include interface (they have since patched it, but it only exists on their main branch, no releases incorporate that change, and I was working off their most recent release), Obviously this is not your problem (nor is it lz4's any longer since they've fixed it LOL), but that is likely why you did not observe any failures. The second issue is that without my changes, even if lz4's CMake config was perfect, if c-blosc were to use the lz4 config module, the variables c-blosc's CMake expects to provide the include directories, normally set by find_package using the vendored FindLZ4.cmake are not set by Lz4's config module, so the compilation would fail as no lz4 includes would be provided to the compiler. LZ4's config utilizes the modern (and currently recommended best practice) approach of composing all of this information into targets for consumption rather than variables for includes and libraries.

Also a bit confused why we the need to differentiate lz4_static and lz4_shared as the -llz4 argument to the linker will do the right thing in both cases.

Another two part answer. One, this approach better models/reflects the state and availability of lz4's binaries we're trying to bring in here, and allows CMake to resolve the correct library as needed. Adding any library to the target_link_libraries call will instruct CMake to add that library to the link lines, so any library provided there needs to be correct for our linkage type, hence the need to differentiate. Further, a generic -llz4 allows the linker to perform the resolution of that libraries location, which is not something that is always desirable, particularly if there are multiple lz4's available on a system, or any sort of sandboxing is desired.
The second reason is Windows. The linker doesn't take a generic library name and resolve the correct type based on the current linkage, it takes whatever is thrown at it and attempts to resolve symbols from that. To that end, we need to be clear about what we're working with, and what's available. Further, on Windows, static and shared libraries both produce .lib files, so they must have different names, meaning we cannot refer to a generic library name, as that may not actually exist (lz4's libraries are named lz4_static and lz4 on Windows).

@johnwparent
Copy link
Author

johnwparent commented Oct 16, 2024

In case you're curious this pipeline is the failure that inspired the creation of this PR.
As you can see, c-blosc detects Lz4's CMake config, but is unable to find the lz4 headers.

@kalvdans
Copy link
Contributor

Thanks a lot for explaining things in much detail! I totally forgot windows, which is usually the main reason projects switches to CMake 🤦

I think it is prettier to fix the vendored Findlz4.cmake (needs to be lowercase to work on Linux) to do the same thing as Lz4's CMake config files. It seems you can make find_package() try config mode first, and then module mode, by set(CMAKE_FIND_PACKAGE_PREFER_CONFIG TRUE).

Would it be possible? Then we move all complexity to the vendored Findlz4.cmake (that we could drop as more and more distros bundle cmake files with lz4).

I have access to a windows machine at work so I can try out the change there.

@kalvdans
Copy link
Contributor

Perhaps we could drop the usecase of a external lz4 package that lacks cmake config, and only support 1) bundled lz4 sources or 2) external lz4 package with cmake config.

@johnwparent
Copy link
Author

Apologies for my very delayed response, hectic week.

Thanks a lot for explaining things in much detail! I totally forgot windows, which is usually the main reason projects switches to CMake 🤦

Happens all the time, c-blosc does a great job being cross platform, I do a lot of porting linux project to cross platform, so I'm just grateful c-blosc works out of the box.

I think it is prettier to fix the vendored Findlz4.cmake (needs to be lowercase to work on Linux) to do the same thing as Lz4's CMake config files.

Agreed, wasn't sure if you folks were meeting particular use case with your vendored copy. I'll re-do the PR to synchronize your vendored find module with Lz4's config.

It seems you can make find_package() try config mode first, and then module mode, by set(CMAKE_FIND_PACKAGE_PREFER_CONFIG TRUE).

Yup! That's actually how I discovered this issue. I'm trying to improve Spack's sandboxing of its builds against system pollution and in doing so, mandated that variable for all our CMake builds and encountered this failure (among numerous others 😓 )

Would it be possible? Then we move all complexity to the vendored Findlz4.cmake (that we could drop as more and more distros bundle cmake files with lz4).

Some of the complexity (static vs shared) will have to remain outside of the find module (it's not the responsibility of the find module to determine useage). I'd still leave the choice of CMAKE_FIND_PACKAGE_PREFER_CONFIG up to users (in general, unless the behavior is imperative to function, let the user choose), but we can refactor the solution so both behave similarly. I'll take a pass at this shortly.

I have access to a windows machine at work so I can try out the change there.

great!

Perhaps we could drop the usecase of a external lz4 package that lacks cmake config, and only support 1) bundled lz4 sources or 2) external lz4 package with cmake config.

I think this would be a good step once the majority of package managers are vendoring an lz4 with the CMake config. For now, it seems like apt, yum, and zypper all vendor the non CMake version, so I don't think making those versions undetectable is a good idea.
My recommendation would be to create an issue on this repo to this effect, and just check in on it once and a while. FWIW feel free to assign/tag me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants