Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel PCA: Python and Benchmarking Code #5988

Open
wants to merge 3 commits into
base: branch-24.10
Choose a base branch
from

Conversation

tomasjoh
Copy link

@tomasjoh tomasjoh commented Jul 27, 2024

Description

This PR relies on C++ implementation from #5987
Adds Python, and benchmarking code for Kernel PCA. This implementation of Kernel PCA support fit(), transform(), and fit_transform().

Feature request: #1317

Tests and benchmarks were performed on an EC2 g4dn.xlarge instance with CUDA 12.2.

Click here to see environment details
 **git***
 commit ade61faff6ac261028bc9b8bbca8b7e67be00d16 (HEAD -> fea-kernel-pca, fork/fea-kernel-pca)
 Author: Tomas Johannesson <[email protected]>
 Date:   Tue Jul 23 22:01:19 2024 -0500

 syntax fix
 **git submodules***

 ***OS Information***
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=22.04
 DISTRIB_CODENAME=jammy
 DISTRIB_DESCRIPTION="Ubuntu 22.04.4 LTS"
 PRETTY_NAME="Ubuntu 22.04.4 LTS"
 NAME="Ubuntu"
 VERSION_ID="22.04"
 VERSION="22.04.4 LTS (Jammy Jellyfish)"
 VERSION_CODENAME=jammy
 ID=ubuntu
 ID_LIKE=debian
 HOME_URL="https://www.ubuntu.com/"
 SUPPORT_URL="https://help.ubuntu.com/"
 BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
 PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
 UBUNTU_CODENAME=jammy
 Linux ip-172-31-36-86 6.5.0-1020-aws #20~22.04.1-Ubuntu SMP Wed May  1 16:10:50 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

 ***GPU Information***
 Thu Jul 25 01:28:58 2024
 +---------------------------------------------------------------------------------------+
 | NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
 |-----------------------------------------+----------------------+----------------------+
 | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
 |                                         |                      |               MIG M. |
 |=========================================+======================+======================|
 |   0  Tesla T4                       On  | 00000000:00:1E.0 Off |                    0 |
 | N/A   32C    P8              14W /  70W |      2MiB / 15360MiB |      0%      Default |
 |                                         |                      |                  N/A |
 +-----------------------------------------+----------------------+----------------------+

 +---------------------------------------------------------------------------------------+
 | Processes:                                                                            |
 |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
 |        ID   ID                                                             Usage      |
 |=======================================================================================|
 |  No running processes found                                                           |
 +---------------------------------------------------------------------------------------+

 ***CPU***
 Architecture:                       x86_64
 CPU op-mode(s):                     32-bit, 64-bit
 Address sizes:                      46 bits physical, 48 bits virtual
 Byte Order:                         Little Endian
 CPU(s):                             4
 On-line CPU(s) list:                0-3
 Vendor ID:                          GenuineIntel
 Model name:                         Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
 CPU family:                         6
 Model:                              85
 Thread(s) per core:                 2
 Core(s) per socket:                 2
 Socket(s):                          1
 Stepping:                           7
 BogoMIPS:                           4999.99
 Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni
 Hypervisor vendor:                  KVM
 Virtualization type:                full
 L1d cache:                          64 KiB (2 instances)
 L1i cache:                          64 KiB (2 instances)
 L2 cache:                           2 MiB (2 instances)
 L3 cache:                           35.8 MiB (1 instance)
 NUMA node(s):                       1
 NUMA node0 CPU(s):                  0-3
 Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
 Vulnerability Itlb multihit:        KVM: Mitigation: VMX unsupported
 Vulnerability L1tf:                 Mitigation; PTE Inversion
 Vulnerability Mds:                  Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
 Vulnerability Meltdown:             Mitigation; PTI
 Vulnerability Mmio stale data:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
 Vulnerability Retbleed:             Vulnerable
 Vulnerability Spec rstack overflow: Not affected
 Vulnerability Spec store bypass:    Vulnerable
 Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
 Vulnerability Spectre v2:           Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline
 Vulnerability Srbds:                Not affected
 Vulnerability Tsx async abort:      Not affected

 ***CMake***
 /home/ubuntu/miniconda3/envs/cuml_dev/bin/cmake
 cmake version 3.29.6

 CMake suite maintained and supported by Kitware (kitware.com/cmake).

 ***g++***
 /home/ubuntu/miniconda3/envs/cuml_dev/bin/g++
 g++ (conda-forge gcc 11.4.0-12) 11.4.0
 Copyright (C) 2021 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


 ***nvcc***
 /home/ubuntu/miniconda3/envs/cuml_dev/bin/nvcc
 nvcc: NVIDIA (R) Cuda compiler driver
 Copyright (c) 2005-2023 NVIDIA Corporation
 Built on Tue_Aug_15_22:02:13_PDT_2023
 Cuda compilation tools, release 12.2, V12.2.140
 Build cuda_12.2.r12.2/compiler.33191640_0

 ***Python***
 /home/ubuntu/miniconda3/envs/cuml_dev/bin/python
 Python 3.11.9

 ***Environment Variables***
 PATH                            : /home/ubuntu/miniconda3/envs/cuml_dev/bin:/home/ubuntu/miniconda3/condabin:/opt/amazon/openmpi/bin:/opt/amazon/efa/bin:/usr/local/cuda-12.1/bin:/usr/local/cuda-12.1/include:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
 LD_LIBRARY_PATH                 : /opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda-12.1/lib:/usr/local/cuda-12.1/lib64:/usr/local/cuda-12.1:/usr/local/cuda-12.1/targets/x86_64-linux/lib/:/usr/local/cuda-12.1/extras/CUPTI/lib64:/usr/local/lib:/usr/lib
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    : /home/ubuntu/miniconda3/envs/cuml_dev
 PYTHON_PATH                     :

 ***conda packages***
 /home/ubuntu/miniconda3/condabin/conda
 # packages in environment at /home/ubuntu/miniconda3/envs/cuml_dev:
 #
 # Name                    Version                   Build  Channel
 _libgcc_mutex             0.1                 conda_forge    conda-forge
 _openmp_mutex             4.5                       2_gnu    conda-forge
 _sysroot_linux-64_curr_repodata_hack 3                   h69a702a_14    conda-forge
 accessible-pygments       0.0.5              pyhd8ed1ab_0    conda-forge
 alabaster                 0.7.16             pyhd8ed1ab_0    conda-forge
 anyio                     4.4.0              pyhd8ed1ab_0    conda-forge
 argon2-cffi               23.1.0             pyhd8ed1ab_0    conda-forge
 argon2-cffi-bindings      21.2.0          py311h459d7ec_4    conda-forge
 arrow                     1.3.0              pyhd8ed1ab_0    conda-forge
 asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
 async-lru                 2.0.4              pyhd8ed1ab_0    conda-forge
 atk-1.0                   2.38.0               h04ea711_2    conda-forge
 attrs                     23.2.0             pyh71513ae_0    conda-forge
 aws-c-auth                0.7.22               h9137712_5    conda-forge
 aws-c-cal                 0.6.15               h88a6e22_0    conda-forge
 aws-c-common              0.9.19               h4ab18f5_0    conda-forge
 aws-c-compression         0.2.18               h83b837d_6    conda-forge
 aws-c-event-stream        0.4.2               h0cbf018_13    conda-forge
 aws-c-http                0.8.2                h360477d_2    conda-forge
 aws-c-io                  0.14.9               h2d549f9_2    conda-forge
 aws-c-mqtt                0.10.4               hf85b563_6    conda-forge
 aws-c-s3                  0.5.10               h679ed35_3    conda-forge
 aws-c-sdkutils            0.1.16               h83b837d_2    conda-forge
 aws-checksums             0.1.18               h83b837d_6    conda-forge
 aws-crt-cpp               0.26.12              h8bc9c4d_0    conda-forge
 aws-sdk-cpp               1.11.329             hf74b5d1_5    conda-forge
 azure-core-cpp            1.12.0               h830ed8b_0    conda-forge
 azure-identity-cpp        1.8.0                hdb0d106_1    conda-forge
 azure-storage-blobs-cpp   12.11.0              ha67cba7_1    conda-forge
 azure-storage-common-cpp  12.6.0               he3f277c_1    conda-forge
 azure-storage-files-datalake-cpp 12.10.0              h29b5301_1    conda-forge
 babel                     2.14.0             pyhd8ed1ab_0    conda-forge
 backports.zoneinfo        0.2.1           py311h38be061_8    conda-forge
 beautifulsoup4            4.12.3             pyha770c72_0    conda-forge
 binutils                  2.40                 h4852527_7    conda-forge
 binutils_impl_linux-64    2.40                 ha1999f0_7    conda-forge
 binutils_linux-64         2.40                 hb3c18ed_4    conda-forge
 bleach                    6.1.0              pyhd8ed1ab_0    conda-forge
 bokeh                     3.4.1              pyhd8ed1ab_0    conda-forge
 brotli                    1.1.0                hd590300_1    conda-forge
 brotli-bin                1.1.0                hd590300_1    conda-forge
 brotli-python             1.1.0           py311hb755f60_1    conda-forge
 bzip2                     1.0.8                h5eee18b_6
 c-ares                    1.28.1               hd590300_0    conda-forge
 c-compiler                1.5.2                h0b41bf4_0    conda-forge
 ca-certificates           2024.6.2             hbcca054_0    conda-forge
 cached-property           1.5.2                hd8ed1ab_1    conda-forge
 cached_property           1.5.2              pyha770c72_1    conda-forge
 cachetools                5.3.3              pyhd8ed1ab_0    conda-forge
 cairo                     1.18.0               hbb29018_2    conda-forge
 certifi                   2024.6.2           pyhd8ed1ab_0    conda-forge
 cffi                      1.16.0          py311hb3a22ac_0    conda-forge
 charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
 click                     8.1.7           unix_pyh707e725_0    conda-forge
 cloudpickle               3.0.0              pyhd8ed1ab_0    conda-forge
 cmake                     3.29.6               hcafd917_0    conda-forge
 colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
 comm                      0.2.2              pyhd8ed1ab_0    conda-forge
 commonmark                0.9.1                      py_0    conda-forge
 contourpy                 1.2.1           py311h9547e67_0    conda-forge
 coverage                  7.5.4           py311h331c9d8_0    conda-forge
 cuda-cccl_linux-64        12.2.140             ha770c72_0    conda-forge
 cuda-crt-dev_linux-64     12.2.140             ha770c72_1    conda-forge
 cuda-crt-tools            12.2.140             ha770c72_1    conda-forge
 cuda-cudart               12.2.140             hd3aeb46_0    conda-forge
 cuda-cudart-dev           12.2.140             hd3aeb46_0    conda-forge
 cuda-cudart-dev_linux-64  12.2.140             h59595ed_0    conda-forge
 cuda-cudart-static        12.2.140             hd3aeb46_0    conda-forge
 cuda-cudart-static_linux-64 12.2.140             h59595ed_0    conda-forge
 cuda-cudart_linux-64      12.2.140             h59595ed_0    conda-forge
 cuda-driver-dev_linux-64  12.2.140             h59595ed_0    conda-forge
 cuda-nvcc                 12.2.140             hcdd1206_0    conda-forge
 cuda-nvcc-dev_linux-64    12.2.140             ha770c72_1    conda-forge
 cuda-nvcc-impl            12.2.140             hd3aeb46_1    conda-forge
 cuda-nvcc-tools           12.2.140             hd3aeb46_1    conda-forge
 cuda-nvcc_linux-64        12.2.140             h8a487aa_0    conda-forge
 cuda-nvrtc                12.2.140             hd3aeb46_0    conda-forge
 cuda-nvvm-dev_linux-64    12.2.140             ha770c72_1    conda-forge
 cuda-nvvm-impl            12.2.140             h59595ed_1    conda-forge
 cuda-nvvm-tools           12.2.140             h59595ed_1    conda-forge
 cuda-profiler-api         12.2.140             ha770c72_0    conda-forge
 cuda-python               12.5.0          py311h817de4b_0    conda-forge
 cuda-version              12.2                 he2b69de_3    conda-forge
 cudf                      24.08.00a189    cuda12_py311_240623_gf536e30172_189    rapidsai-nightly
 cuml                      24.8.0                   pypi_0    pypi
 cupy                      13.2.0          py311he5a987b_0    conda-forge
 cupy-core                 13.2.0          py311h3bdf873_0    conda-forge
 cxx-compiler              1.5.2                hf52228f_0    conda-forge
 cycler                    0.12.1             pyhd8ed1ab_0    conda-forge
 cython                    3.0.10          py311hb755f60_0    conda-forge
 cytoolz                   0.12.3          py311h459d7ec_0    conda-forge
 dask                      2024.5.1           pyhd8ed1ab_0    conda-forge
 dask-core                 2024.5.1           pyhd8ed1ab_0    conda-forge
 dask-cuda                 24.08.00a6      py311_240623_g098109a_6    rapidsai-nightly
 dask-cudf                 24.08.00a189    cuda12_py311_240623_gf536e30172_189    rapidsai-nightly
 dask-expr                 1.1.1              pyhd8ed1ab_1    conda-forge
 dask-glm                  0.3.0                    pypi_0    pypi
 dask-ml                   2024.4.4           pyhd8ed1ab_0    conda-forge
 debugpy                   1.8.1           py311hb755f60_0    conda-forge
 decopatch                 1.4.10             pyhd8ed1ab_0    conda-forge
 decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
 defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
 distributed               2024.5.1           pyhd8ed1ab_0    conda-forge
 distributed-ucxx          0.39.00a        py3.11_240623_g1e6d80c_3    rapidsai-nightly
 dlpack                    0.8                  h59595ed_3    conda-forge
 docutils                  0.19            py311h38be061_1    conda-forge
 doxygen                   1.9.1                hb166930_1    conda-forge
 entrypoints               0.4                pyhd8ed1ab_0    conda-forge
 exceptiongroup            1.2.0              pyhd8ed1ab_2    conda-forge
 execnet                   2.1.1              pyhd8ed1ab_0    conda-forge
 executing                 2.0.1              pyhd8ed1ab_0    conda-forge
 expat                     2.6.2                h59595ed_0    conda-forge
 fastrlock                 0.8.2           py311hb755f60_2    conda-forge
 fmt                       10.2.1               h00ab1b0_0    conda-forge
 font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
 font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
 font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
 font-ttf-ubuntu           0.83                 h77eed37_2    conda-forge
 fontconfig                2.14.2               h14ed4e7_0    conda-forge
 fonts-conda-ecosystem     1                             0    conda-forge
 fonts-conda-forge         1                             0    conda-forge
 fonttools                 4.53.0          py311h331c9d8_0    conda-forge
 fqdn                      1.5.1              pyhd8ed1ab_0    conda-forge
 freetype                  2.12.1               h267a509_2    conda-forge
 fribidi                   1.0.10               h36c2ea0_0    conda-forge
 fsspec                    2024.6.0           pyhff2d567_0    conda-forge
 future                    1.0.0              pyhd8ed1ab_0    conda-forge
 gcc                       11.4.0              h602e360_12    conda-forge
 gcc_impl_linux-64         11.4.0              h00c12a0_12    conda-forge
 gcc_linux-64              11.4.0               ha077dfb_4    conda-forge
 gdk-pixbuf                2.42.12              hb9ae30d_0    conda-forge
 gflags                    2.2.2             he1b5a44_1004    conda-forge
 giflib                    5.2.2                hd590300_0    conda-forge
 glog                      0.7.1                hbabe93e_0    conda-forge
 graphite2                 1.3.13            h59595ed_1003    conda-forge
 graphviz                  11.0.0               hc68bbd7_0    conda-forge
 gtk2                      2.24.33              h280cfa0_4    conda-forge
 gts                       0.7.6                h977cf35_4    conda-forge
 gxx                       11.4.0              h602e360_12    conda-forge
 gxx_impl_linux-64         11.4.0              h634f3ee_12    conda-forge
 gxx_linux-64              11.4.0               h35bfe5d_4    conda-forge
 h11                       0.14.0             pyhd8ed1ab_0    conda-forge
 h2                        4.1.0              pyhd8ed1ab_0    conda-forge
 harfbuzz                  8.5.0                hfac3d4d_0    conda-forge
 hdbscan                   0.8.30          py311h1f0f07a_0    conda-forge
 hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
 httpcore                  1.0.5              pyhd8ed1ab_0    conda-forge
 httpx                     0.27.0             pyhd8ed1ab_0    conda-forge
 hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
 hypothesis                6.103.2            pyha770c72_0    conda-forge
 icu                       73.2                 h59595ed_0    conda-forge
 idna                      3.7                pyhd8ed1ab_0    conda-forge
 imagesize                 1.4.1              pyhd8ed1ab_0    conda-forge
 importlib-metadata        7.2.0              pyha770c72_0    conda-forge
 importlib-resources       6.4.0              pyhd8ed1ab_0    conda-forge
 importlib_metadata        7.2.0                hd8ed1ab_0    conda-forge
 importlib_resources       6.4.0              pyhd8ed1ab_0    conda-forge
 iniconfig                 2.0.0              pyhd8ed1ab_0    conda-forge
 ipykernel                 6.29.4             pyh3099207_0    conda-forge
 ipython                   8.25.0             pyh707e725_0    conda-forge
 isoduration               20.11.0            pyhd8ed1ab_0    conda-forge
 jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
 jinja2                    3.1.4              pyhd8ed1ab_0    conda-forge
 joblib                    1.4.2              pyhd8ed1ab_0    conda-forge
 json5                     0.9.25             pyhd8ed1ab_0    conda-forge
 jsonpointer               3.0.0           py311h38be061_0    conda-forge
 jsonschema                4.22.0             pyhd8ed1ab_0    conda-forge
 jsonschema-specifications 2023.12.1          pyhd8ed1ab_0    conda-forge
 jsonschema-with-format-nongpl 4.22.0             pyhd8ed1ab_0    conda-forge
 jupyter-lsp               2.2.5              pyhd8ed1ab_0    conda-forge
 jupyter_client            8.6.2              pyhd8ed1ab_0    conda-forge
 jupyter_core              5.7.2           py311h38be061_0    conda-forge
 jupyter_events            0.10.0             pyhd8ed1ab_0    conda-forge
 jupyter_server            2.14.1             pyhd8ed1ab_0    conda-forge
 jupyter_server_terminals  0.5.3              pyhd8ed1ab_0    conda-forge
 jupyterlab                4.2.3              pyhd8ed1ab_0    conda-forge
 jupyterlab_pygments       0.3.0              pyhd8ed1ab_1    conda-forge
 jupyterlab_server         2.27.2             pyhd8ed1ab_0    conda-forge
 kernel-headers_linux-64   3.10.0              h4a8ded7_14    conda-forge
 keyutils                  1.6.1                h166bdaf_0    conda-forge
 kiwisolver                1.4.5           py311h9547e67_1    conda-forge
 krb5                      1.21.2               h659d440_0    conda-forge
 lcms2                     2.16                 hb7c19ff_0    conda-forge
 ld_impl_linux-64          2.40                 hf3520f5_7    conda-forge
 lerc                      4.0.0                h27087fc_0    conda-forge
 libabseil                 20240116.2      cxx17_h59595ed_0    conda-forge
 libarrow                  16.1.0          h4a673ee_10_cpu    conda-forge
 libarrow-acero            16.1.0          hac33072_10_cpu    conda-forge
 libarrow-dataset          16.1.0          hac33072_10_cpu    conda-forge
 libarrow-substrait        16.1.0          h7e0c224_10_cpu    conda-forge
 libblas                   3.9.0           22_linux64_openblas    conda-forge
 libbrotlicommon           1.1.0                hd590300_1    conda-forge
 libbrotlidec              1.1.0                hd590300_1    conda-forge
 libbrotlienc              1.1.0                hd590300_1    conda-forge
 libcblas                  3.9.0           22_linux64_openblas    conda-forge
 libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
 libcublas                 12.2.5.6             hd3aeb46_0    conda-forge
 libcublas-dev             12.2.5.6             hd3aeb46_0    conda-forge
 libcudf                   24.08.00a189    cuda12_240623_gf536e30172_189    rapidsai-nightly
 libcufft                  11.0.8.103           hd3aeb46_0    conda-forge
 libcufft-dev              11.0.8.103           hd3aeb46_0    conda-forge
 libcufile                 1.7.2.10             hd3aeb46_0    conda-forge
 libcufile-dev             1.7.2.10             hd3aeb46_0    conda-forge
 libcumlprims              24.08.00a       cuda12_240623_g10e088a_1    rapidsai-nightly
 libcurand                 10.3.3.141           hd3aeb46_0    conda-forge
 libcurand-dev             10.3.3.141           hd3aeb46_0    conda-forge
 libcurl                   8.8.0                hca28451_0    conda-forge
 libcusolver               11.5.2.141           hd3aeb46_0    conda-forge
 libcusolver-dev           11.5.2.141           hd3aeb46_0    conda-forge
 libcusparse               12.1.2.141           hd3aeb46_0    conda-forge
 libcusparse-dev           12.1.2.141           hd3aeb46_0    conda-forge
 libdeflate                1.20                 hd590300_0    conda-forge
 libedit                   3.1.20191231         he28a2e2_2    conda-forge
 libev                     4.33                 hd590300_2    conda-forge
 libevent                  2.1.12               hf998b51_1    conda-forge
 libexpat                  2.6.2                h59595ed_0    conda-forge
 libffi                    3.4.4                h6a678d5_1
 libgcc-devel_linux-64     11.4.0             h8f596e0_112    conda-forge
 libgcc-ng                 13.2.0              h77fa898_12    conda-forge
 libgd                     2.3.3                h119a65a_9    conda-forge
 libgfortran-ng            13.2.0              h69a702a_12    conda-forge
 libgfortran5              13.2.0              h3d2ce59_12    conda-forge
 libglib                   2.80.2               h8a4344b_1    conda-forge
 libgomp                   13.2.0              h77fa898_12    conda-forge
 libgoogle-cloud           2.25.0               h2736e30_0    conda-forge
 libgoogle-cloud-storage   2.25.0               h3d9a0c8_0    conda-forge
 libgrpc                   1.62.2               h15f2491_0    conda-forge
 libhwloc                  2.10.0          default_h5622ce7_1001    conda-forge
 libiconv                  1.17                 hd590300_2    conda-forge
 libjpeg-turbo             3.0.0                hd590300_1    conda-forge
 libkvikio                 24.08.00a       cuda12_240623_g3cc6678_10    rapidsai-nightly
 liblapack                 3.9.0           22_linux64_openblas    conda-forge
 libllvm14                 14.0.6               hcd5def8_4    conda-forge
 libnghttp2                1.58.0               h47da74e_1    conda-forge
 libnl                     3.9.0                hd590300_0    conda-forge
 libnsl                    2.0.1                hd590300_0    conda-forge
 libnvjitlink              12.2.140             hd3aeb46_0    conda-forge
 libopenblas               0.3.27          pthreads_h413a1c8_0    conda-forge
 libparquet                16.1.0          h6a7eafb_10_cpu    conda-forge
 libpng                    1.6.43               h2797004_0    conda-forge
 libprotobuf               4.25.3               h08a7969_0    conda-forge
 libraft                   24.08.00a33     cuda12_240623_gb86a5f90_33    rapidsai-nightly
 libraft-headers           24.08.00a33     cuda12_240623_gb86a5f90_33    rapidsai-nightly
 libraft-headers-only      24.08.00a33     cuda12_240623_gb86a5f90_33    rapidsai-nightly
 libre2-11                 2023.09.01           h5a48ba9_2    conda-forge
 librmm                    24.08.00a17     cuda12_240623_gf2d07976_17    rapidsai-nightly
 librsvg                   2.58.1               hadf69e7_0    conda-forge
 libsanitizer              11.4.0              h5763a12_12    conda-forge
 libsodium                 1.0.18               h36c2ea0_1    conda-forge
 libsqlite                 3.46.0               hde9e2c9_0    conda-forge
 libssh2                   1.11.0               h0841786_0    conda-forge
 libstdcxx-devel_linux-64  11.4.0             h8f596e0_112    conda-forge
 libstdcxx-ng              13.2.0              hc0a3c3a_12    conda-forge
 libthrift                 0.19.0               hb90f79a_1    conda-forge
 libtiff                   4.6.0                h1dd3fc0_3    conda-forge
 libucxx                   0.39.00a        cuda12_240623_g1e6d80c_3    rapidsai-nightly
 libutf8proc               2.8.0                h166bdaf_0    conda-forge
 libuuid                   2.38.1               h0b41bf4_0    conda-forge
 libuv                     1.48.0               hd590300_0    conda-forge
 libwebp                   1.4.0                h2c329e2_0    conda-forge
 libwebp-base              1.4.0                hd590300_0    conda-forge
 libxcb                    1.16                 hd590300_0    conda-forge
 libxcrypt                 4.4.36               hd590300_1    conda-forge
 libxml2                   2.12.7               hc051c1a_1    conda-forge
 libzlib                   1.3.1                h4ab18f5_1    conda-forge
 llvmlite                  0.43.0          py311hbde99c3_0    conda-forge
 locket                    1.0.0              pyhd8ed1ab_0    conda-forge
 lz4                       4.3.3           py311h38e4bf4_0    conda-forge
 lz4-c                     1.9.4                hcb278e6_0    conda-forge
 makefun                   1.15.2             pyhd8ed1ab_0    conda-forge
 markdown                  3.6                pyhd8ed1ab_0    conda-forge
 markdown-it-py            3.0.0              pyhd8ed1ab_0    conda-forge
 markupsafe                2.1.5           py311h459d7ec_0    conda-forge
 matplotlib-base           3.8.4           py311ha4ca890_2    conda-forge
 matplotlib-inline         0.1.7              pyhd8ed1ab_0    conda-forge
 mdurl                     0.1.2              pyhd8ed1ab_0    conda-forge
 mistune                   3.0.2              pyhd8ed1ab_0    conda-forge
 msgpack-python            1.0.8           py311h52f7536_0    conda-forge
 multipledispatch          0.6.0                      py_0    conda-forge
 munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
 nbclient                  0.10.0             pyhd8ed1ab_0    conda-forge
 nbconvert                 7.16.4               hd8ed1ab_1    conda-forge
 nbconvert-core            7.16.4             pyhd8ed1ab_1    conda-forge
 nbconvert-pandoc          7.16.4               hd8ed1ab_1    conda-forge
 nbformat                  5.10.4             pyhd8ed1ab_0    conda-forge
 nbsphinx                  0.9.4              pyhd8ed1ab_0    conda-forge
 nccl                      2.22.3.1             hbc370b7_0    conda-forge
 ncurses                   6.5                  h59595ed_0    conda-forge
 nest-asyncio              1.6.0              pyhd8ed1ab_0    conda-forge
 ninja                     1.12.1               h297d8ca_0    conda-forge
 nltk                      3.8.1              pyhd8ed1ab_0    conda-forge
 notebook                  7.2.1              pyhd8ed1ab_0    conda-forge
 notebook-shim             0.2.4              pyhd8ed1ab_0    conda-forge
 numba                     0.60.0          py311h4bc866e_0    conda-forge
 numpy                     1.26.4          py311h64a7726_0    conda-forge
 numpydoc                  1.6.0              pyhd8ed1ab_0    conda-forge
 nvcomp                    3.0.6                h10b603f_0    conda-forge
 nvtx                      0.2.10          py311h459d7ec_0    conda-forge
 openjpeg                  2.5.2                h488ebb8_0    conda-forge
 openssl                   3.3.1                h4ab18f5_1    conda-forge
 orc                       2.0.1                h17fec99_1    conda-forge
 overrides                 7.7.0              pyhd8ed1ab_0    conda-forge
 packaging                 24.1               pyhd8ed1ab_0    conda-forge
 pandas                    2.2.2           py311h14de704_1    conda-forge
 pandoc                    3.2                  ha770c72_0    conda-forge
 pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
 pango                     1.54.0               h84a9a3c_0    conda-forge
 parso                     0.8.4              pyhd8ed1ab_0    conda-forge
 partd                     1.4.2              pyhd8ed1ab_0    conda-forge
 pathspec                  0.12.1             pyhd8ed1ab_0    conda-forge
 patsy                     0.5.6              pyhd8ed1ab_0    conda-forge
 pcre2                     10.44                h0f59acf_0    conda-forge
 pexpect                   4.9.0              pyhd8ed1ab_0    conda-forge
 pickleshare               0.7.5                   py_1003    conda-forge
 pillow                    10.3.0          py311h82a398c_1    conda-forge
 pip                       24.0               pyhd8ed1ab_0    conda-forge
 pixman                    0.43.2               h59595ed_0    conda-forge
 pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
 platformdirs              4.2.2              pyhd8ed1ab_0    conda-forge
 pluggy                    1.5.0              pyhd8ed1ab_0    conda-forge
 prometheus_client         0.20.0             pyhd8ed1ab_0    conda-forge
 prompt-toolkit            3.0.47             pyha770c72_0    conda-forge
 psutil                    5.9.8           py311h459d7ec_0    conda-forge
 pthread-stubs             0.4               h36c2ea0_1001    conda-forge
 ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
 pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
 py-cpuinfo                9.0.0              pyhd8ed1ab_0    conda-forge
 pyarrow                   16.1.0          py311hbd00459_3    conda-forge
 pyarrow-core              16.1.0          py311h8c3dac4_3_cpu    conda-forge
 pyarrow-hotfix            0.6                pyhd8ed1ab_0    conda-forge
 pycparser                 2.22               pyhd8ed1ab_0    conda-forge
 pydata-sphinx-theme       0.15.3             pyhd8ed1ab_0    conda-forge
 pygments                  2.18.0             pyhd8ed1ab_0    conda-forge
 pylibraft                 24.08.00a33     cuda12_py311_240623_gb86a5f90_33    rapidsai-nightly
 pynndescent               0.5.8              pyh1a96a4e_0    conda-forge
 pynvjitlink               0.2.4           py311hd269673_0    rapidsai
 pynvml                    11.4.1             pyhd8ed1ab_0    conda-forge
 pyparsing                 3.1.2              pyhd8ed1ab_0    conda-forge
 pysocks                   1.7.1              pyha2e5f31_6    conda-forge
 pytest                    7.4.4              pyhd8ed1ab_0    conda-forge
 pytest-benchmark          4.0.0              pyhd8ed1ab_0    conda-forge
 pytest-cases              3.8.5              pyhd8ed1ab_0    conda-forge
 pytest-cov                5.0.0              pyhd8ed1ab_0    conda-forge
 pytest-xdist              3.6.1              pyhd8ed1ab_0    conda-forge
 python                    3.11.9          hb806964_0_cpython    conda-forge
 python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge
 python-fastjsonschema     2.20.0             pyhd8ed1ab_0    conda-forge
 python-json-logger        2.0.7              pyhd8ed1ab_0    conda-forge
 python-tzdata             2024.1             pyhd8ed1ab_0    conda-forge
 python_abi                3.11                    4_cp311    conda-forge
 pytz                      2024.1             pyhd8ed1ab_0    conda-forge
 pyyaml                    6.0.1           py311h459d7ec_1    conda-forge
 pyzmq                     26.0.3          py311h08a0b41_0    conda-forge
 raft-dask                 24.08.00a33     cuda12_py311_240623_gb86a5f90_33    rapidsai-nightly
 rapids-build-backend      0.3.1                      py_0    rapidsai-nightly
 rapids-dask-dependency    24.08.00a4                 py_0    rapidsai-nightly
 rapids-dependency-file-generator 1.13.11                    py_0    rapidsai
 rdma-core                 51.1                 he02047a_0    conda-forge
 re2                       2023.09.01           h7f4b329_2    conda-forge
 readline                  8.2                  h5eee18b_0
 recommonmark              0.7.1              pyhd8ed1ab_0    conda-forge
 referencing               0.35.1             pyhd8ed1ab_0    conda-forge
 regex                     2024.5.15       py311h331c9d8_0    conda-forge
 requests                  2.32.3             pyhd8ed1ab_0    conda-forge
 rfc3339-validator         0.1.4              pyhd8ed1ab_0    conda-forge
 rfc3986-validator         0.1.1              pyh9f0ad1d_0    conda-forge
 rhash                     1.4.4                hd590300_0    conda-forge
 rich                      13.7.1             pyhd8ed1ab_0    conda-forge
 rmm                       24.08.00a17     cuda12_py311_240623_gf2d07976_17    rapidsai-nightly
 rpds-py                   0.18.1          py311h5ecf98a_0    conda-forge
 s2n                       1.4.16               he19d79f_0    conda-forge
 scikit-build-core         0.9.6              pyh4af843d_0    conda-forge
 scikit-learn              1.5.0           py311he08f58d_1    conda-forge
 scipy                     1.13.1          py311h517d4fd_0    conda-forge
 seaborn                   0.13.2               hd8ed1ab_2    conda-forge
 seaborn-base              0.13.2             pyhd8ed1ab_2    conda-forge
 send2trash                1.8.3              pyh0d859eb_0    conda-forge
 setuptools                69.5.1          py311h06a4308_0
 six                       1.16.0             pyh6c4a22f_0    conda-forge
 snappy                    1.2.0                hdb0a2a9_1    conda-forge
 sniffio                   1.3.1              pyhd8ed1ab_0    conda-forge
 snowballstemmer           2.2.0              pyhd8ed1ab_0    conda-forge
 sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
 soupsieve                 2.5                pyhd8ed1ab_1    conda-forge
 sparse                    0.15.4             pyhd8ed1ab_0    conda-forge
 spdlog                    1.12.0               hd2e6256_2    conda-forge
 sphinx                    5.3.0              pyhd8ed1ab_0    conda-forge
 sphinx-copybutton         0.5.2              pyhd8ed1ab_0    conda-forge
 sphinx-markdown-tables    0.0.17             pyh6c4a22f_0    conda-forge
 sphinxcontrib-applehelp   1.0.8              pyhd8ed1ab_0    conda-forge
 sphinxcontrib-devhelp     1.0.6              pyhd8ed1ab_0    conda-forge
 sphinxcontrib-htmlhelp    2.0.5              pyhd8ed1ab_0    conda-forge
 sphinxcontrib-jsmath      1.0.1              pyhd8ed1ab_0    conda-forge
 sphinxcontrib-qthelp      1.0.7              pyhd8ed1ab_0    conda-forge
 sphinxcontrib-serializinghtml 1.1.10             pyhd8ed1ab_0    conda-forge
 sqlite                    3.46.0               h6d4b2fc_0    conda-forge
 stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
 statsmodels               0.14.2          py311h18e1886_0    conda-forge
 sysroot_linux-64          2.17                h4a8ded7_14    conda-forge
 tabulate                  0.9.0              pyhd8ed1ab_1    conda-forge
 tbb                       2021.12.0            h297d8ca_1    conda-forge
 tblib                     3.0.0              pyhd8ed1ab_0    conda-forge
 terminado                 0.18.1             pyh0d859eb_0    conda-forge
 threadpoolctl             3.5.0              pyhc1e730c_0    conda-forge
 tinycss2                  1.3.0              pyhd8ed1ab_0    conda-forge
 tk                        8.6.13          noxft_h4845f30_101    conda-forge
 toml                      0.10.2             pyhd8ed1ab_0    conda-forge
 tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
 tomlkit                   0.12.5             pyha770c72_0    conda-forge
 toolz                     0.12.1             pyhd8ed1ab_0    conda-forge
 tornado                   6.4.1           py311h331c9d8_0    conda-forge
 tqdm                      4.66.4             pyhd8ed1ab_0    conda-forge
 traitlets                 5.14.3             pyhd8ed1ab_0    conda-forge
 treelite                  4.2.1           py311he8f9275_0    conda-forge
 types-python-dateutil     2.9.0.20240316     pyhd8ed1ab_0    conda-forge
 typing-extensions         4.12.2               hd8ed1ab_0    conda-forge
 typing_extensions         4.12.2             pyha770c72_0    conda-forge
 typing_utils              0.1.0              pyhd8ed1ab_0    conda-forge
 tzdata                    2024a                h04d1e81_0
 ucx                       1.15.0               hda83522_8    conda-forge
 ucx-py                    0.39.00a3       py311_240623_g42c03ef_3    rapidsai-nightly
 ucxx                      0.39.00a        cuda12_py3.11_240623_g1e6d80c_3    rapidsai-nightly
 umap-learn                0.5.3           py311h38be061_1    conda-forge
 uri-template              1.3.0              pyhd8ed1ab_0    conda-forge
 urllib3                   2.2.2              pyhd8ed1ab_0    conda-forge
 wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge
 webcolors                 24.6.0             pyhd8ed1ab_0    conda-forge
 webencodings              0.5.1              pyhd8ed1ab_2    conda-forge
 websocket-client          1.8.0              pyhd8ed1ab_0    conda-forge
 wheel                     0.43.0          py311h06a4308_0
 xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
 xorg-libice               1.1.1                hd590300_0    conda-forge
 xorg-libsm                1.2.4                h7391055_0    conda-forge
 xorg-libx11               1.8.9                hb711507_1    conda-forge
 xorg-libxau               1.0.11               hd590300_0    conda-forge
 xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
 xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
 xorg-libxrender           0.9.11               hd590300_0    conda-forge
 xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
 xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
 xorg-xproto               7.0.31            h7f98852_1007    conda-forge
 xyzservices               2024.6.0           pyhd8ed1ab_0    conda-forge
 xz                        5.4.6                h5eee18b_1
 yaml                      0.2.5                h7f98852_2    conda-forge
 zeromq                    4.3.5                h75354e8_4    conda-forge
 zict                      3.0.0              pyhd8ed1ab_0    conda-forge
 zipp                      3.19.2             pyhd8ed1ab_0    conda-forge
 zlib                      1.3.1                h4ab18f5_1    conda-forge
 zstd                      1.5.6                ha6fb4c9_0    conda-forge

Notes for Reviewers

The API deviates from SKlearn by not supporting options for these fields: fit_inverse_transform, random_state, n_jobs, max_iter. If a user tries to set one of them a NotImplementedError will be raised.
The Criteria of Done mentions making the class pickable in cuml/tests/test_pickle.py. I couldn't find a PCA reference for this. Would appreciate pointers if additional work is needed.

Benchmarks

From notebooks/tools/cuml_benchmarks.ipynb
Screenshot 2024-07-24 at 21 14 01

Benchmark output
KernelPCA (n_samples=400, n_features=1000) [cpu=0.030254602432250977, gpu=0.009102821350097656, speedup=3.323651126244107]
KernelPCA (n_samples=400, n_features=10000) [cpu=0.04749751091003418, gpu=0.021541595458984375, speedup=2.204920754382858]
KernelPCA (n_samples=800, n_features=1000) [cpu=0.1384749412536621, gpu=0.01768970489501953, speedup=7.827996118389131]
KernelPCA (n_samples=800, n_features=10000) [cpu=0.1672663688659668, gpu=0.06093120574951172, speedup=2.745167550985272]
KernelPCA (n_samples=1600, n_features=1000) [cpu=0.6741712093353271, gpu=0.06428027153015137, speedup=10.487995667832545]
KernelPCA (n_samples=1600, n_features=10000) [cpu=0.7588849067687988, gpu=0.16866517066955566, speedup=4.499357536332062]
KernelPCA (n_samples=3200, n_features=1000) [cpu=3.6060729026794434, gpu=0.3316190242767334, speedup=10.874143636796315]
KernelPCA (n_samples=3200, n_features=10000) [cpu=4.016584157943726, gpu=0.5563812255859375, speedup=7.219122380906673]
KernelPCA (n_samples=6400, n_features=1000) [cpu=23.54018998146057, gpu=1.6623897552490234, speedup=14.160451787634054]
KernelPCA (n_samples=6400, n_features=10000) [cpu=24.682478427886963, gpu=2.2100284099578857, speedup=11.168398703235363]
KernelPCA (n_samples=12800, n_features=1000) [cpu=170.09102034568787, gpu=10.402626276016235, speedup=16.350776797378664]
KernelPCA (n_samples=12800, n_features=10000) [cpu=177.6021535396576, gpu=11.879239320755005, speedup=14.950633516521307]

We see an even greater speedup when we set n_components = n_samples. Setting n_components to n_samples is the same as default behavior, except zero eigenvalues aren't removed.
Screenshot 2024-08-09 at 17 28 11

Manual tests

Kernel PCA with RBF kernel

code
from sklearn.decomposition import PCA as skPCA, KernelPCA as skKernelPCA
from sklearn import datasets
from cuml.decomposition import PCA as cuPCA
from cuml.experimental.decomposition import KernelPCA as cuKernelPCA
import numpy as np
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
iris = load_iris()
X = iris.data
y = iris.target
sk_pca = skPCA(n_components=3)
X_sk_pca = sk_pca.fit_transform(X)
cu_pca = cuPCA(n_components=3)
X_cu_pca = cu_pca.fit_transform(X)
sk_kpca = skKernelPCA(n_components=3, kernel='rbf')
X_sk_kpca = sk_kpca.fit_transform(X)
cu_kpca = cuKernelPCA(n_components=3, kernel='rbf')
X_cu_kpca = cu_kpca.fit_transform(X)
# Plot the results
fig = plt.figure(figsize=(24, 12))

ax1 = fig.add_subplot(231, projection='3d')
for target in np.unique(y):
ax1.scatter(X[y == target, 0], X[y == target, 1], X[y == target, 2], label=iris.target_names[target])
ax1.set_title('Original Data (First Three Features)')
ax1.set_xlabel('Feature 1')
ax1.set_ylabel('Feature 2')
ax1.set_zlabel('Feature 3')
ax1.legend()

ax2 = fig.add_subplot(232, projection='3d')
for target in np.unique(y):
ax2.scatter(X_sk_pca[y == target, 0], X_sk_pca[y == target, 1], X_sk_pca[y == target, 2], label=iris.target_names[target])
ax2.set_title('SK PCA (3 Components)')
ax2.set_xlabel('Principal Component 1')
ax2.set_ylabel('Principal Component 2')
ax2.set_zlabel('Principal Component 3')
ax2.legend()

ax3 = fig.add_subplot(233, projection='3d')
for target in np.unique(y):
ax3.scatter(X_cu_pca[y == target, 0], X_cu_pca[y == target, 1], X_cu_pca[y == target, 2], label=iris.target_names[target])
ax3.set_title('cuML PCA (3 Components)')
ax3.set_xlabel('Principal Component 1')
ax3.set_ylabel('Principal Component 2')
ax3.set_zlabel('Principal Component 3')
ax3.legend()

ax4 = fig.add_subplot(234, projection='3d')
for target in np.unique(y):
ax4.scatter(X_sk_kpca[y == target, 0], X_sk_kpca[y == target, 1], X_sk_kpca[y == target, 2], label=iris.target_names[target])
ax4.set_title('SK KernelPCA with RBF Kernel (3 Components)')
ax4.set_xlabel('Principal Component 1')
ax4.set_ylabel('Principal Component 2')
ax4.set_zlabel('Principal Component 3')
ax4.legend()

ax5 = fig.add_subplot(235, projection='3d')
for target in np.unique(y):
ax5.scatter(X_cu_kpca[y == target, 0], X_cu_kpca[y == target, 1], X_cu_kpca[y == target, 2], label=iris.target_names[target])
ax5.set_title('cuML KernelPCA with RBF Kernel (3 Components)')
ax5.set_xlabel('Principal Component 1')
ax5.set_ylabel('Principal Component 2')
ax5.set_zlabel('Principal Component 3')
ax5.legend()

plt.show()

image

Kernel PCA with poly kernel

code
from sklearn.datasets import make_classification
def plot_3d_projection(X, y, title, elev=30, azim=30):
    fig = plt.figure(figsize=(8, 6))
    ax = fig.add_subplot(111, projection='3d')
    for target in np.unique(y):
        ax.scatter(X[y == target, 0], X[y == target, 1], X[y == target, 2], label=str(target))
    ax.set_title(title)
    ax.set_xlabel('Component 1')
    ax.set_ylabel('Component 2')
    ax.set_zlabel('Component 3')
    ax.legend()
    ax.view_init(elev=elev, azim=azim)  # Set viewpoint
    plt.show()
X, y = make_classification(n_features=3, n_informative=3, n_redundant=0, n_clusters_per_class=1, n_classes=3)
plot_3d_projection(X, y, 'Original Data', elev=30, azim=60)
poly_kpca = cuKernelPCA(n_components=3, kernel='poly', degree=5, gamma=2, coef0=2)
X_poly_kpca = linear_kpca.fit_transform(X_linear)
plot_3d_projection(X_poly_kpca, y, 'cuML KernelPCA with Poly Kernel', elev=30, azim=120)
Screenshot 2024-07-24 at 21 38 20

Projecting testing data

Case is copied from sklearn, except it uses cuML PCA and kernelPCA

code
from cuml.decomposition import PCA as cuPCA
from cuml.experimental.decomposition import KernelPCA as cuKernelPCA
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split

X, y = make_circles(n_samples=1_000, factor=0.3, noise=0.05, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0)
pca = cuPCA(n_components=2)
kernel_pca = cuKernelPCA(
n_components=None, kernel="rbf", gamma=10, alpha=0.1
)

X_test_pca = pca.fit(X_train).transform(X_test)
X_test_kernel_pca = kernel_pca.fit(X_train).transform(X_test)
fig, (orig_data_ax, pca_proj_ax, kernel_pca_proj_ax) = plt.subplots(
ncols=3, figsize=(14, 4)
)

orig_data_ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test)
orig_data_ax.set_ylabel("Feature #1")
orig_data_ax.set_xlabel("Feature #0")
orig_data_ax.set_title("Testing data")

pca_proj_ax.scatter(X_test_pca[:, 0], X_test_pca[:, 1], c=y_test)
pca_proj_ax.set_ylabel("Principal component #1")
pca_proj_ax.set_xlabel("Principal component #0")
pca_proj_ax.set_title("Projection of testing data\n using PCA")

kernel_pca_proj_ax.scatter(X_test_kernel_pca[:, 0], X_test_kernel_pca[:, 1], c=y_test)
kernel_pca_proj_ax.set_ylabel("Principal component #1")
kernel_pca_proj_ax.set_xlabel("Principal component #0")
_ = kernel_pca_proj_ax.set_title("Projection of testing data\n using KernelPCA")


image

Definition of Done Criteria Checklist

Python Checklist

Design

  • Python class is as "near drop-in replacement" for Scikit-learn (or relevant industry standard) API as possible. This means parameters have the same names as Scikit-learn, and where differences exist, they are clearly documented in docstrings.
  • Initial PR with the API design if there are going to be significant differences with reference APIs, or lack of a reference API, to have a discussion about it.
  • Python class is pickleable and a test has been added to cuml/tests/test_pickle.py
  • APIs use input_to_cuml_array to accept flexible inputs and check their datatypes and use cumlArray.to_output() to return configurable outputs.
  • Any internal parameters or array-based instance variables use CumlArray

Testing

  • Pytests for wrapper functionality against Scikit-learn using relevant datasets
  • Stress tests against reasonable inputs (e.g short-wide, tall-narrow, different numerical precision)
  • Pytests for pickle capability
  • Pytests to evaluate correctness against Scikit-learn on a variety of datasets
  • Add algorithm to benchmarks package in python/cuml/benchmarks/algorithms.py and benchmarks notebook in python/cuml/notebooks/tools/cuml_benchmarks.ipynb
  • PyTests that run in the "unit"-level marker should be quick to execute and should, in general, not significantly increase end-to-end test execution.

Unit test results

Python Test Results
(cuml_dev) ubuntu@ip-172-31-36-86:~/cuml/python$ pytest ./cuml/tests/test_kpca.py
==================================================== test session starts =====================================================
platform linux -- Python 3.11.9, pytest-7.4.4, pluggy-1.5.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/ubuntu/cuml/python
configfile: pyproject.toml
plugins: xdist-3.6.1, hypothesis-6.103.2, benchmark-4.0.0, cases-3.8.5, anyio-4.4.0, cov-5.0.0
collected 165 items

cuml/tests/test_kpca.py ....ssssssss....ssssssss....ssssssss....ssssssss....ssssssss....ssssssss....ssssssss....ssssss [ 56%]
ss....ssssssss....ssssssss....ssssssss....ssssssss..................... [100%]

=============================================== 69 passed, 96 skipped in 4.92s ===============================================

@tomasjoh tomasjoh requested review from a team as code owners July 27, 2024 20:07
Copy link

copy-pr-bot bot commented Jul 27, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added Cython / Python Cython or Python issue CMake labels Jul 27, 2024
@tomasjoh tomasjoh changed the title Kernel PCAPython and benchmarking Kernel PCA: Python and Benchmarking Code Jul 27, 2024
@tomasjoh tomasjoh mentioned this pull request Jul 27, 2024
22 tasks
def _build_params(self, n_rows, n_cols):
IF GPUBUILD == 1:
cdef paramsKPCA *params = new paramsKPCA()
params.n_components = min(self.n_components_ or n_rows, n_rows)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sklearn reference for setting n_components

@tomasjoh tomasjoh changed the base branch from branch-24.08 to branch-24.10 August 7, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake Cython / Python Cython or Python issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant