-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CI jobs #10
Add CI jobs #10
Conversation
I think this has to be under |
Yes, we should ultimately have both wheels and conda packages. |
Ok, then we probably need to do the entire setup here. That means we will need to copy https://github.com/rapidsai/cudf/blob/branch-23.12/.github/copy-pr-bot.yaml and https://github.com/rapidsai/cudf/blob/branch-23.12/.github/ops-bot.yaml into this PR. We will then need to add conda recipes and wheels build scripts. I can probably help get started here but we will also need help from @bdice and @vyasr who are the experts in packaging. For the conda recipe, the closest recipe to If you want, please ping me on Slack and I can try helping get moving here a bit more, but most definitely we will need more help from Bradley and Vyas later anyway. |
@pentschev we've met once or twice and made some progress planning this out - will loop you in on slack. |
ci/build_wheel.sh
Outdated
|
||
rapids-logger "Installing wheel" | ||
|
||
pip install pynvjitlink-0.1.0-cp310-cp310-manylinux_2_35_x86_64.whl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to use an older CI image than what is pinned at rapidsai/ci-conda:latest
because we want to use the oldest OS to ensure that wheels are compatible with an older manylinux tag. This should be manylinux_2_17
if at all possible. Try using a CentOS 7 image and see if that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that we're going to need a Python version matrix. This wheel only support Python 3.10. We need to ship pynvjitlink on Python 3.9, 3.10 for RAPIDS, and also 3.11 and 3.12 if we want to make external consumers from Numba happy. We don't have Python 3.11 / 3.12 Docker images yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmarkall Can you comment on the expected Python version support for this package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to follow SPEC 0, so at the present time 3.10-3.12 support would be my aim. If 3.9 support is also needed for RAPIDS then that seems OK. If we can't initially provide wheels for 3.11 and 3.12 then I think that's also OK.
My reasoning is that for non-RAPIDS use cases, it will always be possible for users to build pynvjitlink themselves (unlike with cubinlinker), and we're covering the RAPIDS deployment scenarios in supporting 3.9 and 3.10 from the start. If there's no short-term route to subsequently adding 3.11 and 3.12, I'd like to look into how to get to supporting them eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there aren't ci-conda
centos images spanning the range of python versions we need. Should we be installing the right version of python ourselves within the container or is there something else I should be doing instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add newer Python CI images, or just use Python versions from conda. Perhaps it will suffice to have a single build job that iterates over all the Python versions, since the builds should be fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd lean towards having a separate job for each build, just so that if something goes wrong with just one of them one doesn't have to parse through the log to figure out which one.
Installing and using the correct python version via conda seems like it's a fairly easy, so I could investigate down that route. What do you mean by "add newer python CI images"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "add newer python CI images"?
Today, our CI images only cover Python 3.9 and 3.10. https://github.com/rapidsai/ci-imgs/blob/bf7d540855e7e94695a54e40080c3acb8681764a/matrix.yaml#L8-L10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it will suffice to have a single build job that iterates over all the Python versions, since the builds should be fast.
I had in mind using a conda variant build like in ucxx, where the conda_build_config.yaml
has a list of Python versions and conda build
knows to build all of them. It can be quite convenient, especially since pynvjitlink may have a wider support matrix of Python versions than RAPIDS as a whole (and is already likely to be using a newer CUDA than we might be using in RAPIDS).
Co-authored-by: Bradley Dice <[email protected]>
@@ -98,13 +104,15 @@ def test_add_cubin_with_fatbin_error(device_functions_fatbin): | |||
nvjitlinker.add_cubin(device_functions_fatbin, name) | |||
|
|||
|
|||
@pytest.mark.skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: not sure why we need to skip this in CI but it works for me locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be xfailed in CI instead of skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still want to skip things due to #3
.github/workflows/pr.yaml
Outdated
python-version: ["3.9", "3.10"] | ||
linux-version: ["centos7", "rockylinux8"] | ||
container: | ||
image: "rapidsai/citestwheel:cuda12.0.1-ubuntu20.04-py${{ matrix.python-version }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is linux-version: ["centos7", "rockylinux8"]
being used? I would've expected it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been updated. To be honest I'm not 100% sure what should be here. I noticed that in cuDF we only test wheels on ubuntu, albeit testing on two versions, and only testing arm with ubuntu 20.04. This seems to line up with what's available for docker under rapidsai/citestwheel
, so I switched it to mirror that here, let me know if that seems to line up with what we want.
.github/workflows/pr.yaml
Outdated
strategy: | ||
matrix: | ||
python-version: ["3.9", "3.10"] | ||
linux-version: ["centos7", "rockylinux8"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above about linux-version
being unused.
@@ -98,13 +104,15 @@ def test_add_cubin_with_fatbin_error(device_functions_fatbin): | |||
nvjitlinker.add_cubin(device_functions_fatbin, name) | |||
|
|||
|
|||
@pytest.mark.skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be xfailed in CI instead of skipped?
Co-authored-by: Bradley Dice <[email protected]>
This reverts commit 78b0b41.
@gmarkall and I met to fix up tests. I'll follow up in a separate PR to fix those tests. For now it seems like PR is ready for review then merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I think we can merge this and test the artifacts, if you agree?
conda list | ||
|
||
rapids-logger "Build wheel" | ||
export SCCACHE_S3_NO_CREDENTIALS=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the follow-up task here is to try using the aws-actions/configure-aws-credentials@v3
step that I linked above. It doesn't matter for this PR, though.
This adds CI to PRs -- but we will need a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to merge - there are some follow-ups that I think we need like test amendments (and figuring out why the library is linked against libcuda.so
), but it would be good to get this in then address those in follow-up PRs.
This seems like a logical next step to me |
This PR adds a CI workflow that runs on each PR.