Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor setup scripts #10670

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

czentgr
Copy link
Collaborator

@czentgr czentgr commented Aug 5, 2024

Use a common file to download and install external dependencies.
Extract versions for each library.

This also addresses #10860
xsimd is removed from brew and instead installed using the install function. The issue is caused by xsimd being newer than works for Velox at this point.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2024
Copy link

netlify bot commented Aug 5, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 6a4dc12
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66d78fe53cc4c40008d4d3e1

@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch from 764aae4 to 35b2e65 Compare August 6, 2024 17:30
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@czentgr Looks great! Some comments.

README.md Outdated
$ make
```

Note that `setup-adapters.sh` supports MacOS and Ubuntu 20.04 or later.
Note that the `install_adapters` command is available for the supports MacOS and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: supported

@@ -16,6 +16,8 @@ ARG image=quay.io/centos/centos:stream9
FROM $image

COPY scripts/setup-helper-functions.sh /
COPY scripts/setup-common.sh /
COPY scripts/setup-linux.sh /
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy the versions file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Adding it.

SCRIPTDIR=$(dirname "${BASH_SOURCE[0]}")
source $SCRIPTDIR/setup-common.sh

function install_hdfs {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not in common?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macos version below does not have hadoop installed. Everything else is the same. Hadoop is required for testing. Let's add this hadoop install directly to ubuntu/centos install_hdfs and remove setup-linux.sh

@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch from 35b2e65 to 8c44bdc Compare August 10, 2024 04:35
@@ -64,7 +60,7 @@ function install_velox_deps_from_dnf {
dnf_install libevent-devel \
openssl-devel re2-devel libzstd-devel lz4-devel double-conversion-devel \
libdwarf-devel elfutils-libelf-devel curl-devel libicu-devel bison flex \
libsodium-devel zlib-devel
libsodium-devel zlib-devel gmock gtest
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the gmock-devel here. This pulls in gtest-devel too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #10422 we now properly detect system gtest, so having that in the images would be nice anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Actually, building the Arrow dependency failed for me from a fresh system until I had gtest-devel installed. Arrow has testing turned on...

On a side note: pip isn't used to install regex and cmake-format anymore? Was that removed?
On a developer machine this would be needed to run make format-check / make-format-fix. In the CI pipeline it has its own container for checking.
Do we want to add this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #10668 for the python changes, the CI container for the check job is mostly for clang-format I think because different versions can show very different results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the venv was added for MacOS. So far the python versions on Linux don't cause a problem but we likely will need to expand it. I was just wondering why we install clang-format/regex on MacOS but not the Linux platforms and whether or not we should have it commonly done.

Copy link
Collaborator

@assignUser assignUser Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point actually, not sure? They probably should? @majetideepak

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a separate script for linux scripts/setup-check.sh. This was likely moved out to simplify building the check container.
Developers mainly use MacOS and they need the check set up. Linux is used for deployment and the checks are not required.
We can include setup-check.sh inside the linux scripts and install as well.
We need to ensure Linux and MacOS use the same clang-format version.

Copy link
Collaborator Author

@czentgr czentgr Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux is used for deployment and the checks are not required.

I think development on Linux is done. I suppose the developers did the setup themselves. There is an assumption that pip would install the same version on all platforms but it could be fixed.

Let me undo the change though and deal with this later.

Copy link
Collaborator

@assignUser assignUser Aug 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Developers mainly use MacOS

eh, maybe (written on linux^^) but even then we want to make it easy to contribute so this should be included in the devsetup for linux as well

@czentgr czentgr closed this Aug 13, 2024
@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch from 8c44bdc to 7474bae Compare August 13, 2024 18:43
@czentgr czentgr reopened this Aug 13, 2024
@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch 2 times, most recently from a0eed92 to e80ea28 Compare August 13, 2024 22:19
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. @kgpai Any thoughts on this?

@majetideepak majetideepak marked this pull request as ready for review August 15, 2024 14:52
@majetideepak majetideepak changed the title [WIP] Refactor setup scripts Refactor setup scripts Aug 15, 2024
@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch 2 times, most recently from 94e6ace to 8d6d937 Compare August 15, 2024 18:26
@assignUser
Copy link
Collaborator

Could you add the macos setup script as a trigger to the macos workflow so it get's run with these changes as well?

Overall a long overdue refactor 👏 nice job!

run_and_time install_s3
run_and_time install_gcs
run_and_time install_abfs
run_and_time install_hdfs
}

function install_velox_deps {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majetideepak Question: we are calling install_fmt but also install it through brew earlier. Should we pick one or the other? I see it was added a while back in one of your PRs.

Copy link
Collaborator

@majetideepak majetideepak Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the brew install. I think fmt version should align with the other FB library versions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks will do. And now this has come up with boost. Brew keeps everything up to date and can cause issues on update. Now seen with boost 1.86.0 while Linux uses boost 1.84.0. Maybe for this we should also switch to the common install function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lets use a common install function for boost as well.

@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch 3 times, most recently from 6eb7dfa to 33686b5 Compare August 20, 2024 21:06
@assignUser
Copy link
Collaborator

CI should go green if you rebase

@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch from 33686b5 to 100065f Compare August 22, 2024 20:10
source scripts/setup-macos.sh
brew install $MACOS_BUILD_DEPS $MACOS_VELOX_DEPS

echo "OS used" ${{ matrix.os }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO remove.

@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch from 100065f to ebc5cf6 Compare August 22, 2024 23:06
@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch from ebc5cf6 to c79ab70 Compare August 23, 2024 13:20
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@czentgr Final set of comments.
Glad to see a bunch of duplication go away.
Thanks for working on this!

cmake_install re2 -DRE2_BUILD_TESTING=OFF
}

function install_gflags {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need install_gflags and install_glog now that they are installed via the system package manager?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still built in Centos9 and doesn't come from the system install.
I checked and Centos9 has gflags 2.2.2 (from epel) which is the same version as it is currently built. Glog is also available as 0.3.5 but that is much outdated compared to 0.6.0 which is installed.
We could install gflags by the system and build glog.
The glog system install depends on gflags.

So I suppose we can move gflags to the system install and keep glog.


function install_aws_deps {
local AWS_REPO_NAME="aws/aws-sdk-cpp"
local
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty local?

}

function install_minio {
local MINIO_ARCH=$1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe add defaults for macos arm (more devs use this platform).

function install_mvfst {
wget_and_untar https://github.com/facebook/mvfst/archive/refs/tags/${FB_OS_VERSION}.tar.gz mvfst
cmake_install mvfst -DBUILD_TESTS=OFF
local MINIO_ARCH=$MACHINE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MACHINE is defined in the setup-common.sh.
This if-else block here and inside other files can go to install_minio there.

@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch from c79ab70 to d5fe2be Compare August 26, 2024 14:07
@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch from d5fe2be to 8569270 Compare August 27, 2024 16:20

DEPENDENCY_DIR=${DEPENDENCY_DIR:-$(pwd)}
MACOS_VELOX_DEPS="bison boost double-conversion flex fmt gflags glog googletest icu4c libevent libsodium lz4 lzo openssl protobuf@21 simdjson snappy thrift xz xsimd zstd"
MACOS_VELOX_DEPS="bison double-conversion flex gflags glog googletest icu4c libevent libsodium lz4 lzo openssl protobuf@21 simdjson snappy thrift xz zstd"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove double-conversion. It is installed using the install function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majetideepak I tried to resolve this dual install.
The problem with removing it here is that this is needed for boost. And the CI pipleine has a path where the actual dependencies are not installed. Only the brew dependencies are installed and the rest comes from bundling.
double-conversion is not bundled and so if the setup script does not run the installation (remember we only turned it on for maOS 13) of the deps it'll fail.

So a couple of options:

  1. Add it to be bundled
  2. Remove install from being called and use version from brew (this just like with other dependencies means it can be updated at any time and cause issues).
  3. Have the CI pipeline always run the setup script for macOS.

@assignUser What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could run the install scripts and cache the dependencies, that would give the best control over versions and have the smallest impact on ci times.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could we do that? It wouldn't be via container? And being on different versions of MacOS it isn't like we have a common build step that needs to be executed only once.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can do that later and just get the scripts in for now and optimize later?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the cache or stash action (though for this cache actually makes more sense) but yeah it's something we have to change in the workflow not the script, this PR is big enough as is we can do it as a follow up!

Copy link
Collaborator Author

@czentgr czentgr Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me then build the dependencies always. There is a current problem with fmt due to it being upgraded in homebrew (because ccache upgrade). We want to make changes to use the predefined and installed versions instead of using uncontrolled build deps from homebrew (which will always be found first because of the linkage). So we don't want to just get the macOS deps from homebrew. Lets see what happens.
This affects this PR as well.

Edit: using the bundled fmt will should work for now.

@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch from 8569270 to d0e9c1f Compare August 29, 2024 15:08
Use a common file to download and install external dependencies.
Extract versions for each library.
@czentgr czentgr force-pushed the cz_refactor_setup_scripts branch from d0e9c1f to 6a4dc12 Compare September 3, 2024 22:38
@@ -78,7 +76,7 @@ jobs:

- name: Configure Build
env:
folly_SOURCE: BUNDLED #brew folly does not have int128
fmt_SOURCE: BUNDLED #picked up brew fmt (from ccache) is too new
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Folly doesn't need to be force bundled because it is not coming from brew anymore but is installed by the setup script.
It uses fmt as bundled because the fmt version is too new. This need to be undone once the build script picks up the installed version via the setup script (and not the one present in homebrew from some dependency).

@assignUser
Copy link
Collaborator

assignUser commented Sep 4, 2024

I am not sure if this should be an addition to the scripts or the docs but it's easy to avoid system install of dependencies (see ftm version issue) by setting CMAKE_INSTALL_PREFIX=$VELOX_DEPS_PATH before running the script and CMAKE_{MODULE|PREFIX}_PATH=$VELOX_DEPS_PATH before the velox build to make the folder discoverable. (where VELOX_DEPS_PATH is the path you want to install the deps to)

Copy link

stale bot commented Dec 3, 2024

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

@stale stale bot added the stale label Dec 3, 2024
@czentgr
Copy link
Collaborator Author

czentgr commented Dec 3, 2024

Will come back to it soon.

@stale stale bot removed the stale label Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants