-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Implement Feature Request from #1077 on Left Padding #1126
Conversation
Update docstrings for issue #1077. This touches the tensorflow and torch dataloader modules and the list_slice op module. The motivation for this is to improve readability. This commit is towards resolving issue #1077 on implementing left padding for sparse sequential features.
@gabrielspmoreira I am not able to add you as a reviewer for this pull request, but I would like your help or assistance for the drafts of this pull request, if you are able to. |
Merge branch 'main' into 1077-implement to remain up-to-date with the current main branch.
Implementation of left padding for issue #1077. This is based on a suggestion by @gabrielspmoreira. I am not exactly sure if this change will completely work, and this is untested due to current failing tests on main on this part of the codebase. But the motivation of this commit is to start a commit for comments, suggestions, and revisions on this issue's implementation.
Update #1077 implementation with some useful feedback from running pre-commit and linters. The motivation is to better pass the CI checks and code consistency.
Implement #1077 update with docstring and type hinting. Note that black adds spaces in the method signature type hinting for the `padding` argument. We add a docstring for _build_spare_tensor(), as this is being modified in this issue's implementation. The motivation for this is improved codebase readability.
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit 295d4e2c1059fe268a6e76560efac27ecaf6f887, no merge conflicts. Running as SYSTEM Setting status of 295d4e2c1059fe268a6e76560efac27ecaf6f887 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3473/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse 295d4e2c1059fe268a6e76560efac27ecaf6f887^{commit} # timeout=10 Checking out Revision 295d4e2c1059fe268a6e76560efac27ecaf6f887 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 295d4e2c1059fe268a6e76560efac27ecaf6f887 # timeout=10 Commit message: "Merge branch 'main' into 1077-implement" > git rev-list --no-walk 81833705f65b1dfc3afea9c0e5b559437b294083 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins4414286201330859595.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+65.g295d4e2 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc: fatal error: Terminated signal terminated program cc1plus compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 Terminated Build was aborted Aborted by �[8mha:////4I6AZwo/1Z8Fal8AhZTEatjIwqNwCcqT21311HdysuK+AAAAlx+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83PyU1x6OyILUoJzMv2y+/JJUBAhiZGBgqihhk0NSjKDWzXb3RdlLBUSYGJk8GtpzUvPSSDB8G5tKinBIGIZ+sxLJE/ZzEvHT94JKizLx0a6BxUmjGOUNodHsLgAzWEgZu/dLi1CL9xJTczDwAj6GcLcAAAAA=�[0madmin Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins5381081211165006542.sh |
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit 5166d57e5a38915e0124ebe6cfc403a4dc41371d, no merge conflicts. Running as SYSTEM Setting status of 5166d57e5a38915e0124ebe6cfc403a4dc41371d to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3499/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse 5166d57e5a38915e0124ebe6cfc403a4dc41371d^{commit} # timeout=10 Checking out Revision 5166d57e5a38915e0124ebe6cfc403a4dc41371d (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 5166d57e5a38915e0124ebe6cfc403a4dc41371d # timeout=10 Commit message: "Merge branch 'main' into 1077-implement" > git rev-list --no-walk bbf74327e67177bdb82fea187ba7aae8193b40d3 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins3193667818210240279.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+75.g5166d57 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+75.g5166d57 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+75.g5166d57 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+75.g5166d57 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.6.0+75.g5166d57 is already the active version in easy-install.pth |
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit e55336c82fb5960a30451d6636500383aa4f20dc, no merge conflicts. Running as SYSTEM Setting status of e55336c82fb5960a30451d6636500383aa4f20dc to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3510/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse e55336c82fb5960a30451d6636500383aa4f20dc^{commit} # timeout=10 Checking out Revision e55336c82fb5960a30451d6636500383aa4f20dc (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f e55336c82fb5960a30451d6636500383aa4f20dc # timeout=10 Commit message: "Merge branch 'main' into 1077-implement" > git rev-list --no-walk c76f67b8049d053658ab327c8969199735341105 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins4448045457917728860.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+78.ge55336c -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+78.ge55336c -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+78.ge55336c -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+78.ge55336c -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.6.0+78.ge55336c is already the active version in easy-install.pth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like we only implement for PyTorch - and we will need to also handle in TensorFlow.
We will also need a test for each of PyTorch and Tensorflow, handling both the left / right padding case .
nvtabular/loader/torch.py
Outdated
@@ -174,8 +174,38 @@ def _get_sparse_tensor(self, values, indices, num_rows, seq_limit): | |||
sparse_tensor = sparse_tensor.to_dense() | |||
return sparse_tensor | |||
|
|||
def _build_sparse_tensor(self, values, offsets, diff_offsets, num_rows, seq_limit): | |||
def _build_sparse_tensor( | |||
self, values, offsets, diff_offsets, num_rows, seq_limit, padding: str = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this padding value supposed to be passed by the user? It seems like this parameter is only set on a non-public method - and isn't set anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Ben for your review. Yes, let me see how to have this option be user-accessible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will plan to implement these options as user-accessible argument in the signatures for the TorchAsyncItr
class in torch.py
and in the KerasSequenceLoader
class in tensorflow.py
, if there are not objections to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benfred Would you have any thoughts or advice on how to implement a user-facing interface to this option? Gabriel had suggested to modify a couple of the private methods in this torch dataloader module. I do not see anywhere though that either of these private methods, _build_sparse_tensor()
or _get_indices()
are used in this torch
module or anywhere else in the codebase. My guess is that he had either wanted to call these private methods directly, or was mistaken where to implement this feature. Would you have any advice or guidance on whether to leave this implementation in the private methods or where to expose the padding side argument to users?
nvtabular/loader/torch.py
Outdated
if padding == "right": | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this fail here? Shouldn't we handle this by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the original issue from Gabriel, it sounded to me, based on what was written there, that right padding has already been implemented. In the description for issue 1077, there is for instance: The PyT and TF Dataloader support padding list (sparse) features to the right, which means that shorter list sequences will be completed with 0s in the right.
I did not want to reduplicate it here to avoid doing the same thing in multiple places of the codebase. Let me investigate some more whether this has been already implemented, and hence should not be duplicated, or whether it makes sense to implement this feature here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benfred Would you know off-hand, based on your knowledge of the codebase, whether right padding has definitely been or not been implemented elsewhere in the repository?
rerun tests |
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit e55336c82fb5960a30451d6636500383aa4f20dc, no merge conflicts. Running as SYSTEM Setting status of e55336c82fb5960a30451d6636500383aa4f20dc to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3545/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse e55336c82fb5960a30451d6636500383aa4f20dc^{commit} # timeout=10 Checking out Revision e55336c82fb5960a30451d6636500383aa4f20dc (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f e55336c82fb5960a30451d6636500383aa4f20dc # timeout=10 Commit message: "Merge branch 'main' into 1077-implement" > git rev-list --no-walk edc99b9a5193d96eaba869336daab46a3c41117b # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins4042292249932188195.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+78.ge55336c -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+78.ge55336c -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+78.ge55336c -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+78.ge55336c -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.6.0+78.ge55336c is already the active version in easy-install.pth |
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit af8aa571efa9abd0690b929d080b7afa8a0fd08b, no merge conflicts. Running as SYSTEM Setting status of af8aa571efa9abd0690b929d080b7afa8a0fd08b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3546/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse af8aa571efa9abd0690b929d080b7afa8a0fd08b^{commit} # timeout=10 Checking out Revision af8aa571efa9abd0690b929d080b7afa8a0fd08b (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f af8aa571efa9abd0690b929d080b7afa8a0fd08b # timeout=10 Commit message: "Merge branch 'main' of github.com:NVIDIA/NVTabular into 1077-implement" > git rev-list --no-walk e55336c82fb5960a30451d6636500383aa4f20dc # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins3711260693971108098.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+89.gaf8aa57 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+89.gaf8aa57 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+89.gaf8aa57 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+89.gaf8aa57 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.6.0+89.gaf8aa57 is already the active version in easy-install.pth |
Update tensorflow dataloader module docstring for docs syntax by using double colons instead of single colon.
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit 299d3564e61bd11f16c92289c7dcfcb9684b828f, no merge conflicts. Running as SYSTEM Setting status of 299d3564e61bd11f16c92289c7dcfcb9684b828f to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3547/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse 299d3564e61bd11f16c92289c7dcfcb9684b828f^{commit} # timeout=10 Checking out Revision 299d3564e61bd11f16c92289c7dcfcb9684b828f (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 299d3564e61bd11f16c92289c7dcfcb9684b828f # timeout=10 Commit message: "Update tensorflow module docstring for docs syntax" > git rev-list --no-walk af8aa571efa9abd0690b929d080b7afa8a0fd08b # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins1854282666025130803.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+90.g299d356 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+90.g299d356 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+90.g299d356 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+90.g299d356 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.6.0+90.g299d356 is already the active version in easy-install.pth |
Expose pad_left argument to user argument to user through including this argument in the signatures in the TorchAsyncIter() and KerasSequenceLoader() classes, as well as their mutual parent class DataLoader(). The motivation is to allow user-specification of left padding.
Skip test_distributed_multigpu() so that I can see a clean pytest output, since this test is failing locally for some mysterious reason.
Add docstring to _build_sparse_tensor() for the TF implementation.
Update docstring for a small spelling error.
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit 0c0ce69ad2442d915976c293d95a6fcf134f1879, no merge conflicts. Running as SYSTEM Setting status of 0c0ce69ad2442d915976c293d95a6fcf134f1879 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3568/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse 0c0ce69ad2442d915976c293d95a6fcf134f1879^{commit} # timeout=10 Checking out Revision 0c0ce69ad2442d915976c293d95a6fcf134f1879 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 0c0ce69ad2442d915976c293d95a6fcf134f1879 # timeout=10 Commit message: "Implement pad_left in _build_sparse_tensor TF" > git rev-list --no-walk d93f9c58d51f0c8a0eaffecc74fde75b12c9828d # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins2789675369852549167.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+101.g0c0ce69 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+101.g0c0ce69 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+101.g0c0ce69 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+101.g0c0ce69 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.6.0+101.g0c0ce69 is already the active version in easy-install.pth |
Refactor torch dataloader pad_left and _build_sparse_tensor() method. The motivation is for improved readability and maintainability.
Update pytest decorator to not skip unrealted test.
Cleanup torch loader for improved readability.
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit 7944b2afc515b14f6e2df97e963d98ecbe63c19b, no merge conflicts. Running as SYSTEM Setting status of 7944b2afc515b14f6e2df97e963d98ecbe63c19b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3569/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse 7944b2afc515b14f6e2df97e963d98ecbe63c19b^{commit} # timeout=10 Checking out Revision 7944b2afc515b14f6e2df97e963d98ecbe63c19b (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 7944b2afc515b14f6e2df97e963d98ecbe63c19b # timeout=10 Commit message: "Merge branch 'main' of 1077-implement" > git rev-list --no-walk 0c0ce69ad2442d915976c293d95a6fcf134f1879 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins6750456438639171111.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+24.g7944b2a -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+24.g7944b2a -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+24.g7944b2a -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+24.g7944b2a -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.7.0+24.g7944b2a is already the active version in easy-install.pth |
@benfred Could you review this PR, when you have a chance, please? |
nvtabular/loader/tensorflow.py
Outdated
tensor = tf.RaggedTensor.from_tensor(ragged.to_tensor(shape=[None, seq_limit])).to_sparse() | ||
if self.pad_left: | ||
max_len = max(max(len(row) for row in ragged), seq_limit) | ||
tensor = tf.stack([tf.pad(row, [[max_len - len(row), 0]]) for row in ragged], axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is iterating through each row the only way to do this? You can actually logically figure out all the padding amounts by doing array math and then you can pass that entire list of "pad_length" entries at once that way its not doing each row, one at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to know. Let me see how to implement this approach you outline here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having a hard time seeing currently a vectorized or further optimized approach to this code snippet. Figuring out the padding amounts by row is not that difficult, but what to do with that tensor of row lengths is very much unclear to me. For instance there is tf.pad()
, but this only takes padding tensors of shape [n, 2]
, where n
is the rank of the original tensor, so that we may only do a constant amount of padding per dimension. For torch
there is torch.nn.functional.pad()
, which also only does a constant amount of left or right padding per dimension. There is tf.ragged
's to_tensor()
method, which implicitly does padding or truncation according to a given shape, but this only does padding on the right. So it is not that clear how to do a further optimized approach using this variable padding tensor. Did you have some other tensorflow
or torch
methods that you had in mind, or some other vectorized approach using the variable padding lengths that I am missing? I have tried some others, but to no success yet. In particular composing to_tensor()
with tf.tensor.pad()
with the pad
method using left padding will not work. I can modify the run-length-encoding of the ragged tensor here, but I would guess that this would also not be a vectorized operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jperez999 How does this current updated implementation of this TensorFlow code snippet without iterating through the rows of the ragged tensor address your feedback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The for loops you are doing are the problem here I think... That methodology is extremely slow. You have to go in to each row read the data... as opposed to doing the entire column at once.
digits = tf.ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])
padded = tf.reverse(digits, [-1]).to_tensor(0)
final_tensor = tf.reverse(padded, [-1])
I think this gives you the padding behavior you want and its a tad bit faster I am clocking this logic in at 0.005383729934692383 and I am clocking your logic at 0.026534557342529297.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jperez999 Did you see the updated TensorFlow implementation that I have here that does not use for
loops? It is at commit 01749f9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that commit you linked there references conda environment files... no code changes... As the PR stands that for loop logic still exists for both torch and tensorflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That commit is a final merge of main into this branch. The changes are in the predecessors of this commit. They should be viewable here in the web UI or by doing a checkout of that commit. The for
loop implementation for TensforFlow is an outdated change. I also tagged you with a comment on the code section below. Are you able to see these changes now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, in my web UI, the tensorflow
code snippet you are referencing has a yellow "Outdated" box next to it.
indices = self._get_indices(offsets, diff_offsets) | ||
if self.pad_left: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again this iteration logic is not the best methodology for covering this: https://stackoverflow.com/questions/48686945/reshaping-a-tensor-with-padding-in-pytorch something like this would be more efficient. torch.nn.functional has a padding function you could use to your advantage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise this is good to know. I will look into how to do this outlined approach for the Torch implementation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same questions for your second comment as I wrote in reply to your first comment above. Would you have any guidance on this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read over this Stack Overflow question along with all the replies. These approaches will not directly work for this torch implementation. The main obstacle in these approaches is that methods like torch.functional.nn.pad
are currently not supported on torch.sparse
matrices. Are there any further optimization other than the O(n)
linear time that I provided in python
code that you see available, such as a vectorized approach, given that these are torch.sparse
matrices we are dealing with? I looked through the available methods for this torch.sparse
class, and nothing seemed immediately relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see the while loop here for the torch side is that accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has not been updated yet, since I would like to hear your feedback first on the tensorflow
side, and since this torch
implementation is dealing with torch.sparse
tensors instead of tf.RaggedTensors()
, the latter of which are easier to implement this for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the for
loop? I see no while
loop in this code block. Edit: I see this loop now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see line 187
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit c7ae873e94c419d3365d444355c463971d579d62, no merge conflicts. Running as SYSTEM Setting status of c7ae873e94c419d3365d444355c463971d579d62 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3570/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse c7ae873e94c419d3365d444355c463971d579d62^{commit} # timeout=10 Checking out Revision c7ae873e94c419d3365d444355c463971d579d62 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f c7ae873e94c419d3365d444355c463971d579d62 # timeout=10 Commit message: "Update docstring" > git rev-list --no-walk 7944b2afc515b14f6e2df97e963d98ecbe63c19b # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins435325596407785481.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+27.gc7ae873 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+27.gc7ae873 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+27.gc7ae873 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+27.gc7ae873 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.7.0+27.gc7ae873 is already the active version in easy-install.pth |
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit d86cec336d5a4ace559b037fabbfa469b1c84ae0, no merge conflicts. Running as SYSTEM Setting status of d86cec336d5a4ace559b037fabbfa469b1c84ae0 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3571/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse d86cec336d5a4ace559b037fabbfa469b1c84ae0^{commit} # timeout=10 Checking out Revision d86cec336d5a4ace559b037fabbfa469b1c84ae0 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f d86cec336d5a4ace559b037fabbfa469b1c84ae0 # timeout=10 Commit message: "Refactor torch dataloader pad_left and _build_spar" > git rev-list --no-walk c7ae873e94c419d3365d444355c463971d579d62 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins1981661871202935081.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+28.gd86cec3 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+28.gd86cec3 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+28.gd86cec3 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+28.gd86cec3 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.7.0+28.gd86cec3 is already the active version in easy-install.pth |
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit d90e1dfe9db39a37e000f5c3f12526aa9a71fde4, no merge conflicts. Running as SYSTEM Setting status of d90e1dfe9db39a37e000f5c3f12526aa9a71fde4 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3572/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse d90e1dfe9db39a37e000f5c3f12526aa9a71fde4^{commit} # timeout=10 Checking out Revision d90e1dfe9db39a37e000f5c3f12526aa9a71fde4 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f d90e1dfe9db39a37e000f5c3f12526aa9a71fde4 # timeout=10 Commit message: "Update pytest decorator" > git rev-list --no-walk d86cec336d5a4ace559b037fabbfa469b1c84ae0 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins5646962055686345408.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+29.gd90e1df -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+29.gd90e1df -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+29.gd90e1df -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+29.gd90e1df -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.7.0+29.gd90e1df is already the active version in easy-install.pth |
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit b21c57dce7348c43107760bdc0c68b62f7f98709, no merge conflicts. Running as SYSTEM Setting status of b21c57dce7348c43107760bdc0c68b62f7f98709 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3573/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse b21c57dce7348c43107760bdc0c68b62f7f98709^{commit} # timeout=10 Checking out Revision b21c57dce7348c43107760bdc0c68b62f7f98709 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f b21c57dce7348c43107760bdc0c68b62f7f98709 # timeout=10 Commit message: "Cleanup torch loader" > git rev-list --no-walk d90e1dfe9db39a37e000f5c3f12526aa9a71fde4 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins7736342282628777672.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+30.gb21c57d -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+30.gb21c57d -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+30.gb21c57d -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+30.gb21c57d -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) Adding nvtabular 0.7.0+30.gb21c57d to easy-install.pth file |
Implement pad_left with TF ops, to address code reviewer's concerns with the former Python for loop with TF ops construction.
Implement pad_left with TF ops, to address code reviewer's concerns with the former Python for loop with TF ops construction. This commit cleans up this implementation.
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit a51aa441865dbf02c7993562f90a4f8ec6a9ae23, no merge conflicts. Running as SYSTEM Setting status of a51aa441865dbf02c7993562f90a4f8ec6a9ae23 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3578/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse a51aa441865dbf02c7993562f90a4f8ec6a9ae23^{commit} # timeout=10 Checking out Revision a51aa441865dbf02c7993562f90a4f8ec6a9ae23 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f a51aa441865dbf02c7993562f90a4f8ec6a9ae23 # timeout=10 Commit message: "Implement pad_left with TF ops cleanup" > git rev-list --no-walk 728c7c375ec5ec2864d33cd486dbdfe8c05a516b # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins1021140301415234565.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+32.ga51aa44 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+32.ga51aa44 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+32.ga51aa44 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+32.ga51aa44 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.7.0+32.ga51aa44 is already the active version in easy-install.pth |
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit 01749f90498801cfddaca638ad3293ca7073024d, no merge conflicts. Running as SYSTEM Setting status of 01749f90498801cfddaca638ad3293ca7073024d to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3579/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse 01749f90498801cfddaca638ad3293ca7073024d^{commit} # timeout=10 Checking out Revision 01749f90498801cfddaca638ad3293ca7073024d (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 01749f90498801cfddaca638ad3293ca7073024d # timeout=10 Commit message: "Merge branch 'main' into 1077-implement" > git rev-list --no-walk a51aa441865dbf02c7993562f90a4f8ec6a9ae23 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins8412208803496141343.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+35.g01749f9 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+35.g01749f9 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+35.g01749f9 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+35.g01749f9 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.7.0+35.g01749f9 is already the active version in easy-install.pth |
else: | ||
tensor = tf.concat([ragged, zeros], axis=1).to_tensor() | ||
|
||
tensor = tf.RaggedTensor.from_tensor(tensor).to_sparse() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jperez999 Are you able to see this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it now... Can you time this compared to the example I gave... would like to see the timing on this execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to hear. Yes one moment please. Is there a particular timing framework or method that you use, for consistency of the timings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this current code block using tf.concat()
, pytest --durations=0 -vv tests/unit/loader/test_tf_dataloader.py -k "test_sparse_tensor_"
gives:
tests/unit/loader/test_tf_dataloader.py::test_sparse_tensor_left_padding[False] PASSED [ 50%]
tests/unit/loader/test_tf_dataloader.py::test_sparse_tensor_left_padding[True] PASSED [100%]
================================================================================================ slowest durations ================================================================================================
1.38s call tests/unit/loader/test_tf_dataloader.py::test_sparse_tensor_left_padding[False]
0.09s call tests/unit/loader/test_tf_dataloader.py::test_sparse_tensor_left_padding[True]
0.00s setup tests/unit/loader/test_tf_dataloader.py::test_sparse_tensor_left_padding[True]
0.00s setup tests/unit/loader/test_tf_dataloader.py::test_sparse_tensor_left_padding[False]
0.00s teardown tests/unit/loader/test_tf_dataloader.py::test_sparse_tensor_left_padding[False]
0.00s teardown tests/unit/loader/test_tf_dataloader.py::test_sparse_tensor_left_padding[True]
======================================================================================== 2 passed, 75 deselected in 3.18s =========================================================================================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a commit with the tf.reverse()
code that I can checkout and do this same timing on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I just meant grabbing that same mini tensor I create in my example (called digits) and run it through your scenario the idea is to get a comparison to see which is the fastest method to ensure that is what we select.
ragged = tf.ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])
### from here down your code ###
non_zero_entries_by_row = tf.math.reduce_sum(ragged / ragged, axis=1)
paddings = seq_limit - non_zero_entries_by_row.numpy()
# Make zeros ragged tensor to pad our data tensor with.
total_entries = ragged.shape[0] * seq_limit
non_zero_entries = tf.reduce_sum(ragged / ragged).numpy()
zeros_count = total_entries - non_zero_entries
zeros_values = tf.zeros(shape=(int(zeros_count)), dtype=tf.dtypes.int64)
zeros = tf.RaggedTensor.from_row_lengths(values=zeros_values, row_lengths=paddings)
# Concatenate zeros ragged tensor with our data tensor on either the left or the right,
# depending on either left_pad or not.
if self.pad_left:
tensor = tf.concat([zeros, ragged], axis=1).to_tensor()
else:
tensor = tf.concat([ragged, zeros], axis=1).to_tensor()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the code I reported the ~.002 seconds execution time
igits = tf.ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])
max_len = max(max(len(row) for row in digits), 7)
tensor = tf.stack([tf.pad(row, [[max_len - len(row), 0]]) for row in digits], axis=0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, here are the timings I got.
import tensorflow as tf
"""
: %timeit foo()
1.24 ms ± 15.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
: %timeit bar()
24.6 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
"""
def foo():
seq_limit = 5
pad_left = True
digits = tf.ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])
padded = tf.reverse(digits, [-1]).to_tensor(0)
tensor = tf.reverse(padded, [-1])
paddings = tf.constant([[0, 0], [2, 0]])
final_tensor = tf.pad(tensor, paddings)
return final_tensor
def bar():
seq_limit = 5
pad_left = True
ragged = tf.ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])
#ragged = tf.RaggedTensor.from_row_lengths(values=values, row_lengths=diff_offsets)
# Get vector of padding lengths using tf ops like reduce_sum.
non_zero_entries_by_row = tf.math.reduce_sum(ragged / ragged, axis=1)
paddings = seq_limit - non_zero_entries_by_row.numpy()
# Make zeros ragged tensor to pad our data tensor with.
total_entries = ragged.shape[0] * seq_limit
non_zero_entries = tf.reduce_sum(ragged / ragged).numpy()
zeros_count = total_entries - non_zero_entries
zeros_values = tf.zeros(shape=(int(zeros_count)), dtype=tf.dtypes.int32)
zeros = tf.RaggedTensor.from_row_lengths(values=zeros_values, row_lengths=paddings)
# Concatenate zeros ragged tensor with our data tensor on either the left or the right,
# depending on either left_pad or not.
if pad_left:
tensor = tf.concat([zeros, ragged], axis=1).to_tensor()
else:
tensor = tf.concat([ragged, zeros], axis=1).to_tensor()
return tensor
foo()
is faster, so I implemented this approach and push a commit along the lines of this approach. Are you able to see this implementation using tf.reverse()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I see the change... Just need to update torch version now...
nvtabular/loader/tensorflow.py
Outdated
ragged = tf.RaggedTensor.from_row_lengths(values=values, row_lengths=diff_offsets) | ||
tensor = tf.RaggedTensor.from_tensor(ragged.to_tensor(shape=[None, seq_limit])).to_sparse() | ||
|
||
# Get vector of padding lengths using tf ops like reduce_sum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jperez999 Here is the beginning of the code block.
Update tensorflow dataloader implementation for speed optimization. This implements a suggested revision by @jperez999 for issue #1077.
…into 1077-implement
Update pad_left TF unit tests to make name consistent with other sparse tensor test and to collect print statements.
Update pad_left code for TF sparse tensors to properly handle the default pad right case.
"""Process column by increasing blocks for use in left padding.""" | ||
col = col.tolist() | ||
prev, curr = 0, 0 | ||
while curr < len(col): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the while loop I am talking about @lesnikow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to know, thank you. I have not implemented anything optimized over this yet. I would like to hear your feedback first on the tensorflow
side. This torch implementation is also operating on torch.sparse
tensors, where torch.functional.nn.pad
and torch.flip
are not implemented on torch.sparse
tensors. Would you have any guidance on how to proceed for this torch.sparse
case?
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit 2febf1ab26997aae38b5f81d973edac48403bdff, no merge conflicts. Running as SYSTEM Setting status of 2febf1ab26997aae38b5f81d973edac48403bdff to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3580/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse 2febf1ab26997aae38b5f81d973edac48403bdff^{commit} # timeout=10 Checking out Revision 2febf1ab26997aae38b5f81d973edac48403bdff (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 2febf1ab26997aae38b5f81d973edac48403bdff # timeout=10 Commit message: "Merge branch '1077-implement' of https://github.com/NVIDIA/NVTabular into 1077-implement" > git rev-list --no-walk 01749f90498801cfddaca638ad3293ca7073024d # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins2834797590353082863.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+37.g2febf1a -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+37.g2febf1a -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+37.g2febf1a -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+37.g2febf1a -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.7.0+37.g2febf1a is already the active version in easy-install.pth |
Click to view CI ResultsGitHub pull request NVIDIA-Merlin/NVTabular#1126 of commit dd9927e22a03742a2a1f6f7b45ac5e58fdd82b10, no merge conflicts. Running as SYSTEM Setting status of dd9927e22a03742a2a1f6f7b45ac5e58fdd82b10 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3581/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1126/*:refs/remotes/origin/pr/1126/* # timeout=10 > git rev-parse dd9927e22a03742a2a1f6f7b45ac5e58fdd82b10^{commit} # timeout=10 Checking out Revision dd9927e22a03742a2a1f6f7b45ac5e58fdd82b10 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f dd9927e22a03742a2a1f6f7b45ac5e58fdd82b10 # timeout=10 Commit message: "Update pad_left code for TF sparse tensors" > git rev-list --no-walk 2febf1ab26997aae38b5f81d973edac48403bdff # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins1755227335575865191.sh Installing NVTabular Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4) Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.1.0) Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0) Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1) running develop running egg_info creating nvtabular.egg-info writing nvtabular.egg-info/PKG-INFO writing dependency_links to nvtabular.egg-info/dependency_links.txt writing requirements to nvtabular.egg-info/requires.txt writing top-level names to nvtabular.egg-info/top_level.txt writing manifest file 'nvtabular.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching '*.h' under directory 'cpp' warning: no files found matching '*.cu' under directory 'cpp' warning: no files found matching '*.cuh' under directory 'cpp' adding license file 'LICENSE' writing manifest file 'nvtabular.egg-info/SOURCES.txt' running build_ext x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17 building 'nvtabular_cpp' extension creating build creating build/temp.linux-x86_64-3.8 creating build/temp.linux-x86_64-3.8/cpp creating build/temp.linux-x86_64-3.8/cpp/nvtabular creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+39.gdd9927e -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+39.gdd9927e -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+39.gdd9927e -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0 x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.7.0+39.gdd9927e -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0 creating build/lib.linux-x86_64-3.8 x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .) nvtabular 0.7.0+39.gdd9927e is already the active version in easy-install.pth |
This pull request is for resolving issue NVIDIA-Merlin/dataloader#128 on implementing left padding for sparse sequential features.
The changes being made, or to be made, in this PR include the implementation of left padding in the torch and tensorflow dataloading modules, any needed changes to user-facing methods, unit test(s) for the this change, and documentation updates related to this fix and while I have been reading and working on the dataloader modules.