Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI jobs #10

Merged
merged 80 commits into from
Nov 17, 2023
Merged

Conversation

brandon-b-miller
Copy link
Contributor

This PR adds a CI workflow that runs on each PR.

@pentschev
Copy link
Member

I think this has to be under .github/workflows/pr.yaml. Additionally the way CI is currently setup probably means you need other steps, for example a .github/workflows/build.yaml. Is the intent to make pynvjitlink a full conda-forge/wheels package?

@brandon-b-miller
Copy link
Contributor Author

I think this has to be under .github/workflows/pr.yaml. Additionally the way CI is currently setup probably means you need other steps, for example a .github/workflows/build.yaml. Is the intent to make pynvjitlink a full conda-forge/wheels package?

Yes, we should ultimately have both wheels and conda packages.

@pentschev
Copy link
Member

Ok, then we probably need to do the entire setup here. That means we will need to copy https://github.com/rapidsai/cudf/blob/branch-23.12/.github/copy-pr-bot.yaml and https://github.com/rapidsai/cudf/blob/branch-23.12/.github/ops-bot.yaml into this PR. We will then need to add conda recipes and wheels build scripts. I can probably help get started here but we will also need help from @bdice and @vyasr who are the experts in packaging.

For the conda recipe, the closest recipe to pynvjitlink I know of is perhaps UCX-Py, so what I would do is first copy its conda files and remove UCX dependencies, probably replace UCX with Numba. We will then need to add build and tests CI scripts, which I would do the same, copy from UCX-Py and replace relevant ucx-py with pynvjitlink. Wheels build and tests scripts are also in the same ci/ subdirectory and probably can get us a starting point, there are no recipes like in conda but we often require static linking, where @vyasr is definitely someone who we will need help from.

If you want, please ping me on Slack and I can try helping get moving here a bit more, but most definitely we will need more help from Bradley and Vyas later anyway.

@brandon-b-miller
Copy link
Contributor Author

@pentschev we've met once or twice and made some progress planning this out - will loop you in on slack.

@brandon-b-miller brandon-b-miller changed the title [TEST] Try adding a CI check Add CI jobs Nov 1, 2023
.github/workflows/pr.yaml Outdated Show resolved Hide resolved
.github/workflows/pr.yaml Outdated Show resolved Hide resolved

rapids-logger "Installing wheel"

pip install pynvjitlink-0.1.0-cp310-cp310-manylinux_2_35_x86_64.whl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to use an older CI image than what is pinned at rapidsai/ci-conda:latest because we want to use the oldest OS to ensure that wheels are compatible with an older manylinux tag. This should be manylinux_2_17 if at all possible. Try using a CentOS 7 image and see if that helps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that we're going to need a Python version matrix. This wheel only support Python 3.10. We need to ship pynvjitlink on Python 3.9, 3.10 for RAPIDS, and also 3.11 and 3.12 if we want to make external consumers from Numba happy. We don't have Python 3.11 / 3.12 Docker images yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarkall Can you comment on the expected Python version support for this package?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to follow SPEC 0, so at the present time 3.10-3.12 support would be my aim. If 3.9 support is also needed for RAPIDS then that seems OK. If we can't initially provide wheels for 3.11 and 3.12 then I think that's also OK.

My reasoning is that for non-RAPIDS use cases, it will always be possible for users to build pynvjitlink themselves (unlike with cubinlinker), and we're covering the RAPIDS deployment scenarios in supporting 3.9 and 3.10 from the start. If there's no short-term route to subsequently adding 3.11 and 3.12, I'd like to look into how to get to supporting them eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there aren't ci-conda centos images spanning the range of python versions we need. Should we be installing the right version of python ourselves within the container or is there something else I should be doing instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add newer Python CI images, or just use Python versions from conda. Perhaps it will suffice to have a single build job that iterates over all the Python versions, since the builds should be fast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd lean towards having a separate job for each build, just so that if something goes wrong with just one of them one doesn't have to parse through the log to figure out which one.

Installing and using the correct python version via conda seems like it's a fairly easy, so I could investigate down that route. What do you mean by "add newer python CI images"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "add newer python CI images"?

Today, our CI images only cover Python 3.9 and 3.10. https://github.com/rapidsai/ci-imgs/blob/bf7d540855e7e94695a54e40080c3acb8681764a/matrix.yaml#L8-L10

Copy link
Contributor

@bdice bdice Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it will suffice to have a single build job that iterates over all the Python versions, since the builds should be fast.

I had in mind using a conda variant build like in ucxx, where the conda_build_config.yaml has a list of Python versions and conda build knows to build all of them. It can be quite convenient, especially since pynvjitlink may have a wider support matrix of Python versions than RAPIDS as a whole (and is already likely to be using a newer CUDA than we might be using in RAPIDS).

https://github.com/rapidsai/ucxx/blob/4464ca942d5ac71aabe4492e2a6d59da453f06dc/conda/recipes/ucxx/conda_build_config.yaml#L19-L21

ci/build_wheel.sh Outdated Show resolved Hide resolved
ci/build_wheel.sh Outdated Show resolved Hide resolved
ci/build_wheel.sh Outdated Show resolved Hide resolved
ci/build_wheel.sh Outdated Show resolved Hide resolved
ci/build_wheel.sh Outdated Show resolved Hide resolved
ci/build_wheel.sh Outdated Show resolved Hide resolved
ci/check_style.sh Show resolved Hide resolved
.github/workflows/pr.yaml Outdated Show resolved Hide resolved
@brandon-b-miller brandon-b-miller requested a review from a team as a code owner November 2, 2023 14:45
@@ -98,13 +104,15 @@ def test_add_cubin_with_fatbin_error(device_functions_fatbin):
nvjitlinker.add_cubin(device_functions_fatbin, name)


@pytest.mark.skip
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: not sure why we need to skip this in CI but it works for me locally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be xfailed in CI instead of skipped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still want to skip things due to #3

.github/workflows/pr.yaml Outdated Show resolved Hide resolved
python-version: ["3.9", "3.10"]
linux-version: ["centos7", "rockylinux8"]
container:
image: "rapidsai/citestwheel:cuda12.0.1-ubuntu20.04-py${{ matrix.python-version }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is linux-version: ["centos7", "rockylinux8"] being used? I would've expected it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been updated. To be honest I'm not 100% sure what should be here. I noticed that in cuDF we only test wheels on ubuntu, albeit testing on two versions, and only testing arm with ubuntu 20.04. This seems to line up with what's available for docker under rapidsai/citestwheel, so I switched it to mirror that here, let me know if that seems to line up with what we want.

strategy:
matrix:
python-version: ["3.9", "3.10"]
linux-version: ["centos7", "rockylinux8"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above about linux-version being unused.

CMakeLists.txt Outdated Show resolved Hide resolved
ci/build_wheel.sh Outdated Show resolved Hide resolved
ci/test_wheel.sh Outdated Show resolved Hide resolved
ci/test_wheel.sh Outdated Show resolved Hide resolved
ci/test_wheel.sh Outdated Show resolved Hide resolved
pynvjitlink/tests/test_pynvjitlink.py Outdated Show resolved Hide resolved
@@ -98,13 +104,15 @@ def test_add_cubin_with_fatbin_error(device_functions_fatbin):
nvjitlinker.add_cubin(device_functions_fatbin, name)


@pytest.mark.skip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be xfailed in CI instead of skipped?

@quasiben
Copy link
Member

@gmarkall and I met to fix up tests. I'll follow up in a separate PR to fix those tests. For now it seems like PR is ready for review then merge

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I think we can merge this and test the artifacts, if you agree?

conda list

rapids-logger "Build wheel"
export SCCACHE_S3_NO_CREDENTIALS=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the follow-up task here is to try using the aws-actions/configure-aws-credentials@v3 step that I linked above. It doesn't matter for this PR, though.

@bdice
Copy link
Contributor

bdice commented Nov 17, 2023

This adds CI to PRs -- but we will need a build.yaml to make this publish the wheels to a nightly index for testing (maybe triggered on tags?). @brandon-b-miller and @quasiben, are we ready for that? I think so?

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to merge - there are some follow-ups that I think we need like test amendments (and figuring out why the library is linked against libcuda.so), but it would be good to get this in then address those in follow-up PRs.

@brandon-b-miller
Copy link
Contributor Author

This adds CI to PRs -- but we will need a build.yaml to make this publish the wheels to a nightly index for testing (maybe triggered on tags?). @brandon-b-miller and @quasiben, are we ready for that? I think so?

This seems like a logical next step to me

@brandon-b-miller brandon-b-miller merged commit c540358 into rapidsai:main Nov 17, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants