New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Added a page documenting scikit-learn for Aurora #573

Open

BethanyL wants to merge 2 commits into main from scikit-learn

BethanyL commented Dec 11, 2024

No description provided.

BethanyL added 2 commits

December 11, 2024 14:52


          add scikit-learn page for Aurora

cb78a18


          Merge branch 'main' into scikit-learn

10eedcd

BethanyL requested review from felker and FilippoSimini

December 11, 2024 20:57

felker mentioned this pull request

Added content for oneDAL on Aurora #575

Open

felker requested changes

View reviewed changes

Member

felker left a comment

just an observation: we don't have a scikit-learn page in any other Machine section. are there architecture/platform-specific performance notes we want to convey for scikit-learn users on Polaris, Sophia, etc? Or is it all straightforward with cuML integration on NVIDIA platforms. What about scaling?

docs/aurora/data-science/frameworks/scikit-learn.md


		## Environment Setup

		Intel Extension for Scikit-learn is already pre-installed on Aurora, available in the `frameworks`

Member

felker Dec 12, 2024

Suggested change

      
            Intel Extension for Scikit-learn is already pre-installed on Aurora, available in the `frameworks` 
          
            Intel Extension for scikit-learn is already pre-installed on Aurora, available in the `frameworks`

I would be consistent with not capitalizing scikit-learn

ethanglaser Dec 16, 2024

It's aligned with official scikit-learn-intelex documentation: https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html

docs/aurora/data-science/frameworks/scikit-learn.md

+              from sklearnex import patch_sklearn
+              patch_sklearn()
+              ```
+              It is important to note that this needs to happen before importing scikit-learn. To explicitly only patch certain estimators, you can import particular functions from sklearnex instead of sklearn, like this:

Member

felker Dec 12, 2024

Suggested change

      
            It is important to note that this needs to happen before importing scikit-learn. To explicitly only patch certain estimators, you can import particular functions from sklearnex instead of sklearn, like this:
          
            It is important to note that this needs to happen before importing scikit-learn. To explicitly only patch certain estimators, you can import particular functions from `sklearnex` instead of `sklearn`, like this:

use inline code formatting when discussing the actual module imports, etc.

docs/aurora/data-science/frameworks/scikit-learn.md

+              ```
+              ### GPU Acceleration
+              Intel Extension for Scikit-learn can execute algorithms on the GPU via the [dpctl](https://intelpython.github.io/dpctl/latest/index.html) package, which should be included in the frameworks module. (If not, see the [Python page](../python.md)) dpctl implements oneAPI concepts like queues and devices.

Member

felker Dec 12, 2024

Suggested change

      
            Intel Extension for Scikit-learn can execute algorithms on the GPU via the [dpctl](https://intelpython.github.io/dpctl/latest/index.html) package, which should be included in the frameworks module. (If not, see the [Python page](../python.md)) dpctl implements oneAPI concepts like queues and devices. 
          
            Intel Extension for scikit-learn can execute algorithms on the GPU via the [dpctl](https://intelpython.github.io/dpctl/latest/index.html) package, which should be included in the frameworks module. (If not, see the [Python page](../python.md)) dpctl implements oneAPI concepts like queues and devices.

docs/aurora/data-science/frameworks/scikit-learn.md

+              As described in more detail in Intel's documentation [here](https://uxlfoundation.github.io/scikit-learn-intelex/latest/oneapi-gpu.html), there are two ways to run on the GPU.
+. Pass the input data to the algorithm as `dpctl.tensor.usm_ndarray`. Then the algorithm will run on the same device as the data and return the result as a usm_array on the same device.
+. Configure Intel Extension for Scikit-learn, for example, by setting a context: `sklearnex.config_context`.

Member

felker Dec 12, 2024

Suggested change

      
            2. Configure Intel Extension for Scikit-learn, for example, by setting a context: `sklearnex.config_context`. 
          
            2. Configure Intel Extension for scikit-learn, for example, by setting a context: `sklearnex.config_context`.

docs/aurora/data-science/frameworks/scikit-learn.md

+. Configure Intel Extension for Scikit-learn, for example, by setting a context: `sklearnex.config_context`.
+              ### Distributed Mode
+              To distribute an sklearnex algorithm across multiple GPUs, we need several ingredients, which we will demonstrate in an example below. We recommend using the MPI backend rather than the CCL backend since it is more tested on Aurora.

Member

felker Dec 12, 2024

Suggested change

      
            To distribute an sklearnex algorithm across multiple GPUs, we need several ingredients, which we will demonstrate in an example below. We recommend using the MPI backend rather than the CCL backend since it is more tested on Aurora.
          
            To distribute an `sklearnex` algorithm across multiple GPUs, we need several ingredients demonstrated in an example below. We recommend using the MPI backend rather than the CCL backend since it is more tested on Aurora.

"more tested" phrasing sounds awkward to me. Maybe "since it is tested more thoroughly on Aurora."

ethanglaser reviewed

View reviewed changes

docs/aurora/data-science/frameworks/scikit-learn.md

Comment on lines +61 to +66

+              if dpctl.has_gpu_devices():
+                  q = dpctl.SyclQueue("gpu")
+              else:
+                  raise RuntimeError(
+                      "GPU devices unavailable."
+                  )

ethanglaser Dec 16, 2024

Also for the sake of a concise demo:

Suggested change

      
            if dpctl.has_gpu_devices():
          
                q = dpctl.SyclQueue("gpu")
          
            else:
          
                raise RuntimeError(
          
                    "GPU devices unavailable."
          
                )
          
            # Create a GPU SYCL queue to store data on device
          
            q = dpctl.SyclQueue("gpu")

docs/aurora/data-science/frameworks/scikit-learn.md

+              module use /soft/modulefiles
+              module load frameworks
+              export NUMEXPR_NUM_THREADS=64

ethanglaser Dec 16, 2024

Is this an Aurora-specific issue? Because afaik its not required for sklearnex spmd usage

docs/aurora/data-science/frameworks/scikit-learn.md

+              # Launch the script
+              mpiexec -np ${NRANKS} -ppn ${NRANKS_PER_NODE} \
+              --cpu-bind ${CPU_BIND} gpu_tile_compact.sh \

ethanglaser Dec 16, 2024

I also run with hostfile flag and PBS_NODEFILE arg. Not sure if its necessary or covered by gpu_tile_compact.sh

docs/aurora/data-science/frameworks/scikit-learn.md


		## Environment Setup

		Intel Extension for Scikit-learn is already pre-installed on Aurora, available in the `frameworks`

ethanglaser Dec 16, 2024

It's aligned with official scikit-learn-intelex documentation: https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html

docs/aurora/data-science/frameworks/scikit-learn.md


		scikit-learn (abbreviated "sklearn") is built for CPUs. However, Intel(R) Extension for Scikit-learn (abbreviated "sklearnex") is a free Python package that speeds up scikit-learn on Intel CPUs & GPUs. For more information, see the [scikit-learn-intelex Github page](https://github.com/uxlfoundation/scikit-learn-intelex), [the documentation](https://uxlfoundation.github.io/scikit-learn-intelex/latest/index.html), or [Intel's website](https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html#gs.b2f4sw).

		The acceleration is by patching: replacing stock scikit-learn algorithms with the versions from Intel(R) oneAPI Data Analytics Library (oneDAL).

ethanglaser Dec 16, 2024

Suggested change

      
            The acceleration is by patching: replacing stock scikit-learn algorithms with the versions from Intel(R) oneAPI Data Analytics Library (oneDAL). 
          
            The accelerated interfaces are available via patching: replacing stock scikit-learn algorithms with versions that utilize Intel(R) oneAPI Data Analytics Library (oneDAL).

docs/aurora/data-science/frameworks/scikit-learn.md

+              from sklearnex import patch_sklearn
+              patch_sklearn()
+              ```
+              It is important to note that this needs to happen before importing scikit-learn. To explicitly only patch certain estimators, you can import particular functions from sklearnex instead of sklearn, like this:

ethanglaser Dec 16, 2024

One other comment on patching - it's not necessary to run distributed GPU algos. Patching allows a user to import and use the CPU estimator the exact same way they would from scikit-learn, or make use of single-GPU interfaces easier, but our spmd algos are available in a different path and thus are unrelated to the patching. If the main goal of this documentation is to support users who will be running these algos on multiple GPUs it may be worth reducing emphasis on patching details

docs/aurora/data-science/frameworks/scikit-learn.md

Comment on lines +42 to +46

+. Create an MPI communicator using mpi4py if you need to use the rank. (mpi4py is also included in the frameworks module.)
+. Check for GPU devices.
+. Use dpctl to create a SYCL queue (connection to the GPU devices you choose).
+. Using dpctl and your queue, move your data to the GPU devices.
+. Run the algorithm on that data. The compute will happen where the data is. The algorithm should be from `sklearnex.spmd`.

ethanglaser Dec 16, 2024

Steps 1 and 2 are not actually required and could be removed for simplicity. mpi4py is a required dependency but no lines of code with mpi4py are actually necessary to run our spmd estimators. however in most cases a user would use this for arranging their data across various devices.

If these are removed, a comment could be added to the code snippet where the communicator is created to indicated that it is being used for the data setup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet