Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run NVTabular Single GPU tests for CUDA 11 and 12 on RAPIDS Runner #1829

Draft
wants to merge 83 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
3064bc4
Test Running NVTabular GPU tests with rapids runner
oliverholworthy Jun 2, 2023
1c93bb4
Update label for runner
oliverholworthy Jun 2, 2023
826168b
Remove pull_request trigger
oliverholworthy Jun 2, 2023
9df8ed6
Add ops-bot.yaml
oliverholworthy Jun 2, 2023
d67a362
Merge branch 'main' into test-rapids-base-image
oliverholworthy Jun 2, 2023
67030c0
Setup python
oliverholworthy Jun 2, 2023
ac8d5bd
remove sudo
oliverholworthy Jun 2, 2023
d680c08
add build-essential
oliverholworthy Jun 2, 2023
31d6afa
Update branch-name action version
oliverholworthy Jun 2, 2023
a9ec8d2
remove trailing slash
oliverholworthy Jun 2, 2023
475b96b
Add git to installed packages
oliverholworthy Jun 2, 2023
fef5b4e
Use devel image for nvvm
oliverholworthy Jun 2, 2023
8fb29a3
Combine install to one line
oliverholworthy Jun 2, 2023
8cf1bce
Add pip cache
oliverholworthy Jun 2, 2023
018b34a
Update deps for gpu tox env
oliverholworthy Jun 2, 2023
31f8fb4
Move package install earlier
oliverholworthy Jun 2, 2023
fd58189
disable cache
oliverholworthy Jun 2, 2023
dab79d2
Pass keyword argument for axis in dataframe any method
oliverholworthy Jun 5, 2023
12caac9
Use Distributed helper for client fixture
oliverholworthy Jun 5, 2023
b54daa4
Add nvidia-cudnn-cu11 to gpu test env
oliverholworthy Jun 5, 2023
930e035
Add posargs to gpu test env
oliverholworthy Jun 5, 2023
0e775a7
Add ops to NVT import
oliverholworthy Jun 5, 2023
89e980f
Use tmpdir for Categorify out_path in test_tf4rec
oliverholworthy Jun 5, 2023
19a59ea
Install libcudnn8 for Tensorflow support on GPU
oliverholworthy Jun 5, 2023
469fb7d
Get visible devices from env var if set
oliverholworthy Jun 5, 2023
e2fbc8e
Remove n_workers from Distributed in conftest
oliverholworthy Jun 5, 2023
5d4c6e7
install libcudnn8 for cuda 11.8
oliverholworthy Jun 5, 2023
f2e4ebc
Merge branch 'main' into test-rapids-base-image
oliverholworthy Jun 21, 2023
95a615a
Run tests outside of tox
oliverholworthy Jun 21, 2023
71dfc22
Correct install name
oliverholworthy Jun 21, 2023
78be2b6
Add call to pytest
oliverholworthy Jun 21, 2023
bc89308
Add marker for loader
oliverholworthy Jun 21, 2023
b3aebb7
Run loader tests separately
oliverholworthy Jun 21, 2023
7b45ebc
Remove newline from conftest.py
oliverholworthy Jun 21, 2023
73e0581
Add tensorflow marker
oliverholworthy Jun 21, 2023
5793541
Run test_tf_dataloader separately
oliverholworthy Jun 21, 2023
93a9932
install current project
oliverholworthy Jun 21, 2023
fcd1a43
run torch tests alongside the rest
oliverholworthy Jun 21, 2023
615ec7b
Reformat conftest.py
oliverholworthy Jun 21, 2023
a6e0d49
Add marker for ops
oliverholworthy Jun 21, 2023
bf40759
Set num thread env vars
oliverholworthy Jun 21, 2023
e00790b
Run tests in tox
oliverholworthy Jun 21, 2023
93fcbb1
Call stop on dataloader after each test
oliverholworthy Jun 22, 2023
15b1b69
Don't run torch tests in gpu tests
oliverholworthy Jun 22, 2023
b23d9ab
Stop dataloader in test_dataloader_schema test
oliverholworthy Jun 22, 2023
d32575b
Remove torch constraint from tests
oliverholworthy Jun 22, 2023
2a15374
Remove unnecessary dataloader stop commands from test_tf_dataloader.py
oliverholworthy Jun 22, 2023
92fd262
Remove commented commands from tox config
oliverholworthy Jun 22, 2023
4e8d7e2
Revert changes to passenv in test-gpu
oliverholworthy Jun 22, 2023
f6864fb
Move dependencies to deps section in test-gpu environment
oliverholworthy Jun 23, 2023
40554ec
Rename ENV var MERLIN_BRANCH to MERLIN_REF
oliverholworthy Jun 23, 2023
577fce1
Rename MERLIN_REF to MERLIN_BRANCH
oliverholworthy Jun 23, 2023
231d589
Add nvidia pypi to extra index url env var
oliverholworthy Jun 23, 2023
f5ecdc6
Remove git version from merlin-models test.txt
oliverholworthy Jun 23, 2023
c1de4d8
Add separate env for CUDA 11 and CUDA 12
oliverholworthy Jun 23, 2023
40a6062
Set protobuf implementation env var
oliverholworthy Jun 23, 2023
781b635
remove cudnn and protobuf package from install
oliverholworthy Jun 23, 2023
d7a304e
Update tox env name in gpu-tests.yml
oliverholworthy Jun 23, 2023
9100b5b
Replace np.bool with bool
oliverholworthy Jun 23, 2023
08619d8
Replace np.long with int in data_gen
oliverholworthy Jun 23, 2023
5602f68
Remove invalid comment from deps
oliverholworthy Jun 23, 2023
6ca1d5a
Run GPU tests with CUDA 12
oliverholworthy Jun 23, 2023
15d4ee2
Install libcudnn8 package for tensorflow GPU support
oliverholworthy Jun 23, 2023
98c3a3e
Remove unused numpy import from fill.py
oliverholworthy Jun 28, 2023
cede7d1
Use Python 3.9 for cu12 gpu test environment
oliverholworthy Jun 28, 2023
b92777e
Run compute before merge in test_embedding_cat_export_import
oliverholworthy Jun 29, 2023
a2d285e
Add tensorflow pytest merk to example notebook 1 and 2
oliverholworthy Jun 29, 2023
2f9db24
Debug: run only workflow test
oliverholworthy Jun 29, 2023
91f4de5
Debug: print dataframes in test_workflow
oliverholworthy Jun 29, 2023
afc0ce9
debug: print train_df
oliverholworthy Jun 29, 2023
4f4e746
Remove use of merge (results in non-deterministic row ordering)
oliverholworthy Jun 29, 2023
8220231
remove train_df
oliverholworthy Jun 29, 2023
4604f63
Restore posargs default in gpu tox envs
oliverholworthy Jun 29, 2023
ccfeda6
Restore multi-gpu test job and tox envirionment
oliverholworthy Jun 29, 2023
cc61029
Remove sitepackages=true from test-gpu envs
oliverholworthy Jun 29, 2023
917827b
Remove unnecessary env vars from gpu-tests tox envs
oliverholworthy Jun 29, 2023
d264997
remove blank line from gpu-tests.yml
oliverholworthy Jun 29, 2023
53522c2
Merge branch 'main' into test-rapids-base-image
oliverholworthy Jul 6, 2023
9c9f32f
Add job for running NVTabular tests with conda
oliverholworthy Jul 27, 2023
32e39c3
Use conda-incubator/setup-miniconda instead of setup-micromamba
oliverholworthy Jul 27, 2023
7d975db
Update ref for branch-name-pull-request
oliverholworthy Jul 27, 2023
fdb65b0
Update cuda dependencies to 12
oliverholworthy Jul 27, 2023
8bbd1ad
Add tox to conda test env
oliverholworthy Jul 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 96 additions & 5 deletions .github/workflows/gpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,19 @@ name: GPU Tests
on:
workflow_dispatch:
push:
branches: [main]
branches:
- main
- pull-request/*
tags:
- "v[0-9]+.[0-9]+.[0-9]+"
pull_request:
branches: [main]
types: [opened, synchronize, reopened, closed]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
# Multi-GPU tests

gpu-tests:
runs-on: 2GPU

Expand All @@ -31,4 +32,94 @@ jobs:
raw=$(git branch -r --contains ${{ github.ref_name }})
branch=${raw/origin\/}
fi
cd ${{ github.workspace }}; tox -e test-gpu -- $branch
cd ${{ github.workspace }}; MERLIN_BRANCH=$branch tox -e test-gpu

# Single GPU tests

gpu-tests-conda-cu12:
runs-on: linux-amd64-gpu-p100-latest-1
container:
image: nvidia/cuda:12.1.1-devel-ubuntu22.04
env:
NVIDIA_VISIBLE_DEVICES: ${{ env.NVIDIA_VISIBLE_DEVICES }}
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Install Ubuntu packages
run: |
apt-get update -y
apt-get install -y git lsb-release
- uses: conda-incubator/setup-miniconda@v2
with:
miniforge-variant: Mambaforge
use-mamba: true
activate-environment: cu12-env
environment-file: conda/environments/test-cu12.yaml
python-version: "3.10"
- name: Get Branch name
id: get-branch-name
uses: NVIDIA-Merlin/.github/actions/branch-name@9f82e25a18e2b4a3f4350e9f287c2c31e906d89e
- name: Run tests
run: |
merlin_branch="${{ steps.get-branch-name.outputs.branch }}"
MERLIN_BRANCH=$merlin_branch tox -e test-gpu

gpu-tests-cu11:
runs-on: linux-amd64-gpu-p100-latest-1
container:
image: nvidia/cuda:11.8.0-devel-ubuntu22.04
env:
NVIDIA_VISIBLE_DEVICES: ${{ env.NVIDIA_VISIBLE_DEVICES }}
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Install Ubuntu packages
run: |
apt-get update -y
# libcudnn8 installed for tensorflow GPU support
apt-get install -y git lsb-release 'libcudnn8=*cuda11.8'
- name: Set up Python 3.8
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Install and upgrade python packages
run: |
python -m pip install --upgrade pip tox
- name: Get Branch name
id: get-branch-name
uses: NVIDIA-Merlin/.github/actions/branch-name@9f82e25a18e2b4a3f4350e9f287c2c31e906d89e
- name: Run tests
run: |
merlin_branch="${{ steps.get-branch-name.outputs.branch }}"
RAPIDS_VERSION=23.04 MERLIN_BRANCH=$merlin_branch tox -e test-gpu-cu11

gpu-tests-cu12:
runs-on: linux-amd64-gpu-p100-latest-1
container:
image: nvidia/cuda:12.1.1-devel-ubuntu22.04
env:
NVIDIA_VISIBLE_DEVICES: ${{ env.NVIDIA_VISIBLE_DEVICES }}
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Install Ubuntu packages
run: |
apt-get update -y
apt-get install -y git lsb-release
- name: Set up Python 3.9
uses: actions/setup-python@v4
with:
python-version: 3.9
- name: Install and upgrade python packages
run: |
python -m pip install --upgrade pip tox
- name: Get Branch name
id: get-branch-name
uses: NVIDIA-Merlin/.github/actions/branch-name@9f82e25a18e2b4a3f4350e9f287c2c31e906d89e
- name: Run tests
run: |
merlin_branch="${{ steps.get-branch-name.outputs.branch }}"
RAPIDS_VERSION=23.06 MERLIN_BRANCH=$merlin_branch tox -e test-gpu-cu12
11 changes: 11 additions & 0 deletions conda/environments/test-cu12.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
name: cu12-env
channels:
- conda-forge
- rapidsai-nightly
- nvidia
dependencies:
- cuda-version=12
- cuda-nvcc=12
- cudf=23.08
- dask-cudf=23.08
- tox=4
2 changes: 1 addition & 1 deletion requirements/test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ s3fs>=2021.4
aiobotocore>=1.3.3

# required for synthetic data `merlin.datasets` and notebook tests using merlin models
merlin-models[tensorflow]@git+https://github.com/NVIDIA-Merlin/models.git
merlin-models[tensorflow]

# needed to run notebook tests
nest-asyncio
Expand Down
1 change: 1 addition & 0 deletions tests/unit/examples/test_01-Getting-started.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
nest_asyncio.apply()


@pytest.mark.tensorflow
def test_example_01_getting_started():
with testbook(
REPO_ROOT / "examples" / "01-Getting-started.ipynb",
Expand Down
1 change: 1 addition & 0 deletions tests/unit/examples/test_02-Advanced-NVTabular-workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
nest_asyncio.apply()


@pytest.mark.tensorflow
def test_example_02_advanced_workflow():
with testbook(
REPO_ROOT / "examples" / "02-Advanced-NVTabular-workflow.ipynb",
Expand Down
51 changes: 44 additions & 7 deletions tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,51 @@ sitepackages=true
; to install requirements.txt yet. As we get better at python environment isolation, we will
; need to add some back.
deps =
pytest
pytest-cov
-rrequirements/test.txt
git+https://github.com/NVIDIA-Merlin/core.git@{env:MERLIN_BRANCH}
git+https://github.com/NVIDIA-Merlin/dataloader.git@{env:MERLIN_BRANCH}
git+https://github.com/NVIDIA-Merlin/models.git@{env:MERLIN_BRANCH}
commands =
python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git
python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/dataloader.git
python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/models.git
python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git@{posargs:main}
python -m pytest --cov-report term --cov merlin -rxs tests/unit
python -m pytest --cov-report term --cov merlin -rxs {posargs:tests/unit}

[testenv:test-gpu-cu11]
; Runs in: GitHub Actions
; Runs GPU-based tests.
setenv =
TF_GPU_ALLOCATOR=cuda_malloc_async
PIP_EXTRA_INDEX_URL=https://pypi.nvidia.com
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
passenv =
CUDA_VISIBLE_DEVICES
deps =
-rrequirements/test.txt
git+https://github.com/NVIDIA-Merlin/core.git@{env:MERLIN_BRANCH}
git+https://github.com/NVIDIA-Merlin/dataloader.git@{env:MERLIN_BRANCH}
git+https://github.com/NVIDIA-Merlin/models.git@{env:MERLIN_BRANCH}
nvidia-cudnn-cu11==8.6.0.163
cudf-cu11=={env:RAPIDS_VERSION}
dask-cudf-cu11=={env:RAPIDS_VERSION}
commands =
python -m pytest --cov-report term --cov merlin -rxs -s {posargs:tests/unit}

[testenv:test-gpu-cu12]
; Runs in: GitHub Actions
; Runs GPU-based tests.
setenv =
PIP_EXTRA_INDEX_URL=https://pypi.nvidia.com
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
passenv =
CUDA_VISIBLE_DEVICES
deps =
-rrequirements/test.txt
git+https://github.com/NVIDIA-Merlin/core.git@{env:MERLIN_BRANCH}
git+https://github.com/NVIDIA-Merlin/dataloader.git@{env:MERLIN_BRANCH}
git+https://github.com/NVIDIA-Merlin/models.git@{env:MERLIN_BRANCH}
cudf-cu12=={env:RAPIDS_VERSION}
dask-cudf-cu12=={env:RAPIDS_VERSION}
commands =
; Latest TensorFlow PyPI package does not currently support CUDA 12
python -m pytest --cov-report term --cov merlin -rxs -s -m 'not tensorflow' {posargs:tests/unit}

[testenv:test-merlin]
; Runs in: Internal Jenkins
Expand Down