Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44950: [C++] Bump minimum CMake version to 3.25 #44989

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

raulcd
Copy link
Member

@raulcd raulcd commented Dec 10, 2024

This is currently under development. I am only exercising CI to find out what things would require to be updated
This is still being decided and a decision hasn't been finalized yet.

Rationale for this change

We want to upgrade our CMake version to 3.25 as discussed on the ML:
https://lists.apache.org/thread/h8jp16ktrj11fmjmjhlg6xvkvv9wzvjk

What changes are included in this PR?

TBD

Are these changes tested?

Yes, via CI.

Are there any user-facing changes?

Yes, the minimum CMake version to be used to build Arrow is bumped to 3.25.
This PR includes breaking changes to build systems.

Copy link

⚠️ GitHub issue #44950 has been automatically assigned in GitHub to PR creator.

@raulcd
Copy link
Member Author

raulcd commented Dec 10, 2024

I am unsure on how to fix the remaining failures for R.
For the gcc 12 job I am unsure why it is failing and for the Windows C++ RTools 40 ucrt64 it seems we install CMake from MINGW here but I am not sure if this is necessary or can be updated.
https://github.com/apache/arrow/blob/main/ci/scripts/PKGBUILD#L39

Of course this is the initial CI (we also have to update all the extended CI jobs for crossbow).

@h-vetinari
Copy link
Contributor

For the gcc 12 job I am unsure why it is failing

Updating the CMake lower bound will flip the default of several policies from legacy to new; sounds like you might be relying on legacy behaviour there in some way (once you figure out which policy is at fault, there's usually a migration path to keep the old behaviour)

@raulcd
Copy link
Member Author

raulcd commented Dec 11, 2024

ok, it seems R forces the builds to use the CMake provided on the images:

**** Not using cmake found at /bin/cmake
Error in .make_numeric_version(x, strict, .standard_regexps()$valid_numeric_version) : 
  invalid non-character version specification 'x' (type: double)
Calls: build_libarrow ... as.numeric_version -> numeric_version -> .make_numeric_version
Execution halted

@jonkeane @assignUser @amoeba will this be an issue for CRAN? Are we somehow forced to the CMake version on those images?

root@03abb5b759ba:/# /bin/cmake --version
cmake version 3.22.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).

@assignUser
Copy link
Member

assignUser commented Dec 11, 2024

I went through the logs of our recent checks on cran and only one is using a version < 3.25 and that seems more incidental then purposely as it's the r-odrel arm64 but the intel version has 3.26

So I don't think we should be forced to use that cmake version, additionally we have a function that fetches current cmake if an unsuited version is found but apparently there is an issue with it as seen above. IIRC there was a change to numeric version in one of the las R Versions that is causing this? I'll have a look.

@nealrichardson
Copy link
Member

It's possible the version comparison error is from this: https://github.com/apache/arrow/pull/44989/files#diff-935746c34b16289a07b0d9bf7642dbd268b18059b6187f7cdec7c464be47a3deL731-L743

cmake_version <- function(cmd = "cmake") {
  tryCatch(
    {
      raw_version <- system(paste(cmd, "--version"), intern = TRUE, ignore.stderr = TRUE)
      pat <- ".* ([0-9\\.]+).*?"
      which_line <- grep(pat, raw_version)
      package_version(sub(pat, "\\1", raw_version[which_line]))
    },
    error = function(e) {
      return(0)
    }
  )
}

The error case should probably return("0")

@nealrichardson
Copy link
Member

Two other places in the R nixlibs.R script worth updating:

@pitrou
Copy link
Member

pitrou commented Dec 11, 2024

The error case should probably return("0")

It would probably be more forward-looking to avoid the error entirely. Why does the function fail parsing the CMake version? Can we add cmake --version somewhere in the GH workflow?

@nealrichardson
Copy link
Member

nealrichardson commented Dec 11, 2024

The error case should probably return("0")

It would probably be more forward-looking to avoid the error entirely. Why does the function fail parsing the CMake version? Can we add cmake --version somewhere in the GH workflow?

I could be remembering wrong, but I believe the function is used to check for cmake of a certain version, and this is to be robust to where cmake may not be installed or not found at the path provided. It does not emit an error, it traps it.

This might not be where the error is coming from that was observed in CI, I was just browsing the source to see where you might get a numeric version error. Looking again, and reading the output it produced, I think we're hitting this: https://github.com/apache/arrow/pull/44989/files#diff-935746c34b16289a07b0d9bf7642dbd268b18059b6187f7cdec7c464be47a3deL718

      } else {
        # Keep trying
        lg("Not using cmake found at %s", path, .indent = "****")
        if (found_version > 0) {
          lg("Version >= %s required; found %s", version_required, found_version, .indent = "*****")
        } else {

should be found_version > "0". We must not have been running into this before because the "found_version" was always sufficient if found, and if it wasn't found, we were returning a numeric 0 which works in this comparison.

(To be clear, we need to fix both this and the return("0") above.)

.env Outdated
@@ -54,6 +54,7 @@ UBUNTU=22.04

# Default versions for various dependencies
CLANG_TOOLS=14
CMAKE=3.25.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible (and also would it be advisable...) to call this CMAKE_VERSION instead? One of the R failures I suspect might be due to us looking to CMAKE for a path to the cmake executable itself at

Sys.getenv("CMAKE"),
(though I don't have a full traceback I'm not 100% certain on that)

Copy link
Member

@jonkeane jonkeane Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah I didn't see the previous comments on the PR (sorry, I must have had a tab open and didn't refresh!) but I suspect this here is the answer to:

It would probably be more forward-looking to avoid the error entirely. Why does the function fail parsing the CMake version?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Dec 12, 2024
@github-actions github-actions bot removed the awaiting changes Awaiting changes label Dec 13, 2024
@github-actions github-actions bot added the awaiting change review Awaiting change review label Dec 13, 2024
@jonkeane
Copy link
Member

I've pushed the changes Neal suggested which should fix the ubuntu failure 🤞 (hope you don't mind, @raulcd !)

@jonkeane
Copy link
Member

I did a bit of digging on the windows front, and I suspect what's going on is the the MSYS2/mingw cmake is being preferred (I'm not familiar enough with that ecosystem to know if we can override it with something on the system, but that also seems fragile itself).

https://github.com/raulcd/arrow/blob/e5d521134db4bed8507572fece3b56eb4a1b9158/ci/scripts/r_windows_build.sh#L26-L30 has some info about how to test newer dependencies, I wonder if we (conditionally) use those pacman commands to insall a newer cmake than what's in the CRAN repo (https://cloud.r-project.org/bin/windows/Rtools/4.0/ucrt64/ has the version I'm seeing installed listed, so if we override that one in our builds that might be sufficient.

@nealrichardson
Copy link
Member

I did a bit of digging on the windows front, and I suspect what's going on is the the MSYS2/mingw cmake is being preferred (I'm not familiar enough with that ecosystem to know if we can override it with something on the system, but that also seems fragile itself).

https://github.com/raulcd/arrow/blob/e5d521134db4bed8507572fece3b56eb4a1b9158/ci/scripts/r_windows_build.sh#L26-L30 has some info about how to test newer dependencies, I wonder if we (conditionally) use those pacman commands to insall a newer cmake than what's in the CRAN repo (https://cloud.r-project.org/bin/windows/Rtools/4.0/ucrt64/ has the version I'm seeing installed listed, so if we override that one in our builds that might be sufficient.

Unfortunately, that won't help us. https://github.com/r-windows/rtools-packages/ is archived; CRAN has moved away from the toolchain that Jeroen was maintaining. We should delete that comment.

For newer cmake with older rtools, maybe we can get it from https://github.com/Kitware/CMake/releases/download/v3.31.2/cmake-3.31.2-windows-x86_64.zip or something? Or maybe we can install newer rtools for cmake but use the older rtools for compilers etc.?

@raulcd
Copy link
Member Author

raulcd commented Dec 17, 2024

I don't understand why the job is installing mingw-w64-ucrt-x86_64-cmake-3.21.3-1-any.pkg.tar.x when the remote package seems to be updated to a newer version: https://packages.msys2.org/packages/mingw-w64-ucrt-x86_64-cmake
And we seem to be updating to pull the newest packages:

+ pacman --noconfirm -Syy
:: Synchronizing package databases...
downloading mingw32.db...
downloading mingw64.db...
downloading ucrt64.db...
downloading mirrors.db...

edit:
Ok, I see this is using CRAN mirrors here: https://cloud.r-project.org/bin/windows/Rtools/4.0/ucrt64/

Probably naive question here, I suppose we can't update the mirrors here to point to msys2 and download the newer cmake from there before building Arrow as we would have the same issue when we try to publish to CRAN, right? @jonkeane @nealrichardson

@nealrichardson
Copy link
Member

Probably naive question here, I suppose we can't update the mirrors here to point to msys2 and download the newer cmake from there before building Arrow as we would have the same issue when we try to publish to CRAN, right? @jonkeane @nealrichardson

Honestly I don't know. But it's not relevant for CRAN because (a) CRAN currently only builds on R 4.3 and 4.4, both of which have new enough cmake, and (b) we don't require cmake on CRAN anyway because we build the libarrow C++ library in our CI and download it in the CRAN build--CRAN only compiles the R bindings, which do not require cmake.

Solving the cmake issue for R < 4.3 is only for our own purposes of building C++ libraries that are compatible with older versions of R on Windows. We support more versions of R than CRAN actively checks on, though it appears that in CI, we only check on the Windows current release version, and when we test old R versions, we do it on linux.

IIUC we're building libarrow with the rtools4.0 toolchain for maximum compatibility. If we want to continue doing that, we could try installing newer cmake somewhere before https://github.com/apache/arrow/blob/main/.github/workflows/r.yml#L293 and making sure it's on the PATH. Or, to bump up, we would change the rtools version in that job from 40 to 43 to use the rtools43 toolchain with newer cmake.

@pitrou
Copy link
Member

pitrou commented Dec 18, 2024

We support more versions of R than CRAN actively checks on

Do we still want to do that? How is our R support policy decided?

@nealrichardson
Copy link
Member

We support more versions of R than CRAN actively checks on

Do we still want to do that? How is our R support policy decided?

We generally follow the tidyverse version support policy, which is the most recent 5 versions. Many enterprise users of R don't upgrade versions eagerly so they can be stuck on older versions longer than CRAN's testing window.

If there is an easy way to get new enough cmake into the R windows-cpp job so that we can keep building with the R 4.0 toolchain, that would be ideal. I don't see why we couldn't do that.

@nealrichardson
Copy link
Member

One more reference to 3.16 to update: https://github.com/apache/arrow/blob/main/r/DESCRIPTION#L31

@raulcd
Copy link
Member Author

raulcd commented Dec 19, 2024

After a bunch of trial and error, I seem to be stuck with the following issue if I override the CMake version to be used to either a manually 3.25.0 installed or using the latest available (3.31) on msys2:

-- Found hdfs.h at: D:/a/arrow/arrow/cpp/thirdparty/hadoop/include/hdfs.h
CMake Error at C:/rtools40/ucrt64/lib/cmake/AWSSDK/AWSSDKConfig.cmake:119 (while):
  while() given incorrect arguments:

    "NOT" "TEMP_NAME" "STREQUAL"

  Unknown arguments specified
Call Stack (most recent call first):
  cmake_modules/FindAWSSDKAlt.cmake:36 (find_package)
  cmake_modules/ThirdpartyToolchain.cmake:294 (find_package)
  cmake_modules/ThirdpartyToolchain.cmake:5389 (resolve_dependency)
  CMakeLists.txt:546 (include)

I am trying some things, like using an updated aws-sdk-cpp version, but I am unsure where to go from there

@nealrichardson
Copy link
Member

The aws-sdk-cpp it's using is coming from https://github.com/r-windows/rtools-packages/tree/master/mingw-w64-aws-sdk-cpp

If you need to update it, I think you'll have to drop it from the dependencies that get downloaded in our PKGBUILD and make it dependency source=BUNDLED. Doesn't look like we're able to make any updates happen to that repo.

cc @assignUser, in case you wanted to start pushing on the MXE direction again as an alternative 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants