Use MKL for verification #63

AD2605 · 2024-05-15T16:16:31Z

Adds support to use oneMKL for verification and verifies example 14.

Co-authored-by: Mehdi Goli <[email protected]>

* Migrate cute sgemm_nt_1 to sycl * Revert "Migrate cute sgemm_nt_1 to sycl" * Revert "Revert "Migrate cute sgemm_nt_1 to sycl"" * Add cutlass example * Update examples/14_ampere_tf32_tensorop_gemm/CMakeLists.txt Co-authored-by: Mehdi Goli <[email protected]> * Update examples/14_ampere_tf32_tensorop_gemm/ampere_tf32_tensorop_gemm_cute.cpp Co-authored-by: Mehdi Goli <[email protected]> * Update examples/14_ampere_tf32_tensorop_gemm/ampere_tf32_tensorop_gemm_cute.cu Co-authored-by: Mehdi Goli <[email protected]> * Update examples/cute/tutorial/CMakeLists.txt Co-authored-by: Mehdi Goli <[email protected]> * Update include/cute/arch/copy_sm80.hpp Co-authored-by: Mehdi Goli <[email protected]> * Update include/cute/config.hpp Co-authored-by: Mehdi Goli <[email protected]> * Update include/cute/util/debug.hpp Co-authored-by: Mehdi Goli <[email protected]> * Update include/cute/util/debug.hpp Co-authored-by: Mehdi Goli <[email protected]> * Update include/cutlass/detail/helper_macros.hpp Co-authored-by: Mehdi Goli <[email protected]> * Address comments --------- Co-authored-by: Mehdi Goli <[email protected]>

* Remove sgemm_nt_1 PoC * Fix build issues * Fix code style format * Remove ENABLE_NVPTX flag * Update include/cute/util/debug.hpp Co-authored-by: Mehdi Goli <[email protected]> * Cosmetic --------- Co-authored-by: Mehdi Goli <[email protected]>

…tware#16) * Updating README-sycl.md to capture the 3.5 modifications * Update README-sycl.md Co-authored-by: aacostadiaz <[email protected]> * Remove the sgemm_nt_1_sycl PoC (codeplaysoftware#15) * Remove sgemm_nt_1 PoC * Fix build issues * Fix code style format * Remove ENABLE_NVPTX flag * Update include/cute/util/debug.hpp Co-authored-by: Mehdi Goli <[email protected]> * Cosmetic --------- Co-authored-by: Mehdi Goli <[email protected]> * Applying the comments --------- Co-authored-by: aacostadiaz <[email protected]>

…eplaysoftware#16)" (codeplaysoftware#17) This reverts commit a726bd3.

* Adding CuTe example in SYCL Co-authored-by: Alejandro Acosta <[email protected]> Co-authored-by: Mehdi Goli <[email protected]>

* Migrate Cute components to SYCL

* Add cmake configuration * Update examples/cute/tutorial/CMakeLists.txt Co-authored-by: Mehdi Goli <[email protected]> --------- Co-authored-by: Mehdi Goli <[email protected]>

* Update README-sycl.md Fixing CUDA version

…tware#25)

Co-authored-by: Mehdi Goli <[email protected]>

Fix typo in Macro Co-authored-by: Mehdi Goli <[email protected]> * Cosmetic --------- Co-authored-by: Mehdi Goli <[email protected]> * Applying the comments --------- Co-authored-by: aacostadiaz <[email protected]> * Revert "Updating README-sycl.md to capture the 3.5 modifications (codeplaysoftware#16)" (codeplaysoftware#17) This reverts commit a726bd3. * fix typo in macro --------- Co-authored-by: Mehdi Goli <[email protected]> Co-authored-by: aacostadiaz <[email protected]>

Add XE MMA/copy atom

Removes the CUDA toolkit dependency when SYCL is enabled. --------- Co-authored-by: Mehdi Goli <[email protected]> Co-authored-by: Atharva Dubey <[email protected]>

Fix some issues when trying to build and run the 14_ampere example for SYCL. --------- Co-authored-by: Muhammad Tanvir <[email protected]>

Enables Cute Tests via SYCL Path --------- Co-authored-by: Mehdi Goli <[email protected]> Co-authored-by: aacostadiaz <[email protected]>

…l-develop

Add an Intel PVC pipeline to compute GEMM.

Add example for intel PVC GEMM

cmake/FindDPCPP.cmake

cmake/onemkl.cmake

examples/CMakeLists.txt

tools/util/include/cutlass/util/device_memory.h

cmake/onemkl.cmake

Co-authored-by: Alejandro Acosta <[email protected]>

…arva/add_mkl

cmake/onemkl.cmake

aacostadiaz

LGTM

CMakeLists.txt

mehdi-goli · 2024-05-24T16:16:18Z

examples/14_ampere_tf32_tensorop_gemm/ampere_tf32_tensorop_gemm_cute.cpp

@@ -42,6 +42,10 @@
 #include "cutlass/util/device_memory.h"
 #include "helper.h"

+#if defined(CUTLASS_ENABLE_MKL)
+#  include  <oneapi/mkl.hpp>


you are building oneMKL interface for Nvidia GPU, you are linking it to the mkl close source for Intel even if it is on Nvida? what is your mechanism to have a correct linking and make sure that the oneMKL does not run on Intel CPU instead of calling cublas behind the scene to go to GPU. If you are always testing it on the IntelCPU there is no need to have this :

-DENABLE_MKLGPU_BACKEND=${ENABLE_MKLGPU_BACKEND} -DENABLE_CUBLAS_BACKEND=${ENABLE_CUBLAS_BACKEND} ```

what is your mechanism to have a correct linking and make sure that the oneMKL does not run on Intel CPU

From what I understood about the oneMKL interface, the dispatch is in accordance to the queue. Whether it is a GPU or a CPU queue, and if a GPU queue, then what backend it will select, depending on the Device.
Since our queue will never be a CPU queue, It will never go to the CPU BLAS side of things.

The oneMKL does not have all the backend unless you provide, for example if you dont give a correct pass to the backed there is no GPU or nvidia backend. OneMKL interface is a wrapper over 3 close source actual implementation, meaning oneMKL product, cuBLAs and rocBLAS. if the backend is not available it does not link to the correct one. the oneMKL close source product is only for intel CPU and GPU, so when you are running on Nvidia GPU you dont need the oneMKL product backend at all, similarly, when running on Intel GPU you dont need the cublas Backend at all

if the backend is not available it does not link to the correct one. the oneMKL close source product is only for intel CPU and GPU, so when you are running on Nvidia GPU you dont need the oneMKL product backend at all, similarly, when running on Intel GPU you dont need the cublas Backend at all

Only one backend is being built. CPU backend is explicitly turned off, and either of the CUBLAS or the MKL_GPU (closed source oneMKL) backend is being built, here, in accordance to the SYCL target being passed.

mehdi-goli · 2024-05-24T16:17:44Z

examples/14_ampere_tf32_tensorop_gemm/ampere_tf32_tensorop_gemm_cute.cpp

+#if defined(CUTLASS_ENABLE_MKL)
+  auto d_CRef = syclcompat::malloc<TC>(h_C.size());
+  auto d_hRef = std::vector<TC>(h_C.size());
+  auto queue = syclcompat::get_default_queue();


what is the default queue in here, how can you make sure the queue is nvidia gpu queue. This test that you are running is specifically set for Amper device, and needed to be compiled only for that device due to definition of the architecture at compile time. how do you make sure that you get a correct queue in all systems when you use default here

This test that you are running is specifically set for Amper device, and needed to be compiled only for that device due to definition of the architecture at compile time.

This example only builds if SYCL_NVIDIA_TARGET is defined (here).

how can you make sure the queue is nvidia gpu queue

Since we do not explicitly create a queue, or pass any sort of device selection of to syclcompat, I would expect that the device selector is being set when calling the binary assuming multiple backends on the platform.

The reason I use the default_queue from sycl compat is because that's being used throughout, for memory allocations etc...

The selector should be passed somehow probably through SYCLcompat, sycl default queue does not guaranty selection of any devices and you wont have any control when multiple GPU is available

You can create a queue using the GPU device selector and then use syclcompat::set_default_queue. After that, you can add a check like the one in the cuda example.

cudaDeviceProp props; cudaError_t error = cudaGetDeviceProperties(&props, 0); if (error != cudaSuccess) { std::cerr << "cudaGetDeviceProperties() returned an error: " << cudaGetErrorString(error) << std::endl; return -1; } if (!((props.major * 10 + props.minor) >= 80)) { std::cerr << "Ampere Tensor Core operations must be run on a machine with compute capability at least 80." << std::endl; notSupported = true; } if (notSupported) { // Returning zero so this test passes on older Toolkits. Its actions are no-op. return 0; }

examples/CMakeLists.txt

Co-authored-by: Mehdi Goli <[email protected]>

aacostadiaz and others added 30 commits March 21, 2024 16:01

Migrate cute sgemm_nt_1 to SYCLX

27b550e

"Migrate cute example to sycl"

35f0f25

Migrate cute sgemm_nt_1 to SYCL"

d986eff

fix sycl header

3f5f86d

Initial README for SYCL support

14cf978

Update README-sycl.md

c67ed0f

Co-authored-by: Mehdi Goli <[email protected]>

Update README-sycl.md

ca2cc24

Co-authored-by: Mehdi Goli <[email protected]>

Merge branch 'NVIDIA:main' into sycl-develop

82c44d2

Revert "Updating README-sycl.md to capture the 3.5 modifications (cod…

84e730f

…eplaysoftware#16)" (codeplaysoftware#17) This reverts commit a726bd3.

Updating the readme-sycl.md (codeplaysoftware#18)

3c0f4bd

Merge remote-tracking branch 'upstream/sycl-develop' into sycl-develop

4e901a6

Cute examples in SYCL (codeplaysoftware#14)

fea43b7

* Adding CuTe example in SYCL Co-authored-by: Alejandro Acosta <[email protected]> Co-authored-by: Mehdi Goli <[email protected]>

Merge branch 'codeplaysoftware:sycl-develop' into sycl-develop

9fd5087

Migrate cute components to SYCL (codeplaysoftware#19)

13adaed

* Migrate Cute components to SYCL

Add CMake configuration (codeplaysoftware#20)

f07d0ee

* Add cmake configuration * Update examples/cute/tutorial/CMakeLists.txt Co-authored-by: Mehdi Goli <[email protected]> --------- Co-authored-by: Mehdi Goli <[email protected]>

Merge remote-tracking branch 'upstream/sycl-develop' into sycl-develop

a57b605

Update README-sycl.md (codeplaysoftware#22)

aa9c364

* Update README-sycl.md Fixing CUDA version

Add XE MMA/copy atom

4f3e97d

Update to 3.5 API

2fd2d84

fixing device only code that get called in the host side (codeplaysof…

d30f750

…tware#25)

Fix GPU clock (codeplaysoftware#21)

43d692f

Apply suggestions from code review

4d2b315

Co-authored-by: Mehdi Goli <[email protected]>

Merge remote-tracking branch 'upstream/sycl-develop' into sycl-develop

256f6c5

Merge remote-tracking branch 'upstream/sycl-develop' into sycl-develop

4568ea7

Merge pull request codeplaysoftware#23 from rolandschulz/add-xe-atoms

ddbba2f

Add XE MMA/copy atom

Update README-sycl.md (codeplaysoftware#31)

51f5c2e

aacostadiaz and others added 9 commits May 7, 2024 11:34

Remove CUDA toolkit dependency for SYCL (codeplaysoftware#49)

f4782bf

Removes the CUDA toolkit dependency when SYCL is enabled. --------- Co-authored-by: Mehdi Goli <[email protected]> Co-authored-by: Atharva Dubey <[email protected]>

Define CUdeviceptr type (codeplaysoftware#60)

6217cc6

[CP-Sec] Add SECURITY.md file (codeplaysoftware#62)

da086af

Fix issues with the ampere example (codeplaysoftware#61)

b9c5187

Fix some issues when trying to build and run the 14_ampere example for SYCL. --------- Co-authored-by: Muhammad Tanvir <[email protected]>

Enable Cute tests (codeplaysoftware#57)

9d12ff6

Enables Cute Tests via SYCL Path --------- Co-authored-by: Mehdi Goli <[email protected]> Co-authored-by: aacostadiaz <[email protected]>

Merge remote-tracking branch 'codeplay_cutlass/sycl-develop' into syc…

dc016e4

…l-develop

CMake Changes for using oneMKL in examples

113a4c3

fix verification issue and restore default sizes

6ea9c67

add new line

7afff5d

AD2605 requested a review from aacostadiaz May 15, 2024 16:17

muhammad-tanvir-1211 added 2 commits May 16, 2024 09:16

Include changes for Intel PVC pipeline (codeplaysoftware#51)

7e7db70

Add an Intel PVC pipeline to compute GEMM.

Example changes for Intel PVC pipeline (codeplaysoftware#52)

ce10e06

Add example for intel PVC GEMM

aacostadiaz requested changes May 16, 2024

View reviewed changes

AD2605 and others added 6 commits May 16, 2024 10:10

Make sure SYCL_TARGET is defined before setting MKL Backend

197c4a2

Co-authored-by: Alejandro Acosta <[email protected]>

Update examples/CMakeLists.txt

2713e7b

Co-authored-by: Alejandro Acosta <[email protected]>

removed repeated check from onemkl.cmake

2341884

Merge remote-tracking branch 'codeplay_cutlass/sycl-develop' into ath…

5610cc6

…arva/add_mkl

change device_memory.h after PVC example merge

b1f7aa1

restore global offset flag

59c371e

aacostadiaz reviewed May 20, 2024

View reviewed changes

cmake/onemkl.cmake Show resolved Hide resolved

aacostadiaz approved these changes May 20, 2024

View reviewed changes

AD2605 mentioned this pull request May 24, 2024

Fix Computation for large batch sizes #73

Closed

mehdi-goli reviewed May 24, 2024

View reviewed changes

AD2605 and others added 4 commits May 28, 2024 08:18

Update CMakeLists.txt

9cdba84

Co-authored-by: Mehdi Goli <[email protected]>

Update CMakeLists.txt

7b6eadd

Co-authored-by: Mehdi Goli <[email protected]>

Update CMakeLists.txt

ca61e2a

Co-authored-by: Mehdi Goli <[email protected]>

Update examples/CMakeLists.txt

e632ea0

Co-authored-by: Mehdi Goli <[email protected]>

aacostadiaz force-pushed the sycl-develop branch from 039211b to 8f3bd67 Compare July 16, 2024 10:06

aacostadiaz force-pushed the sycl-develop branch from 8959591 to 2aae80e Compare August 6, 2024 12:12

aacostadiaz closed this Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use MKL for verification #63

Use MKL for verification #63

AD2605 commented May 15, 2024

aacostadiaz left a comment

mehdi-goli May 24, 2024

AD2605 May 27, 2024

mehdi-goli May 28, 2024 •

edited

Loading

AD2605 May 28, 2024

mehdi-goli May 24, 2024

AD2605 May 27, 2024

mehdi-goli May 28, 2024

aacostadiaz May 28, 2024

Use MKL for verification #63

Use MKL for verification #63

Conversation

AD2605 commented May 15, 2024

aacostadiaz left a comment

Choose a reason for hiding this comment

mehdi-goli May 24, 2024

Choose a reason for hiding this comment

AD2605 May 27, 2024

Choose a reason for hiding this comment

mehdi-goli May 28, 2024 • edited Loading

Choose a reason for hiding this comment

AD2605 May 28, 2024

Choose a reason for hiding this comment

mehdi-goli May 24, 2024

Choose a reason for hiding this comment

AD2605 May 27, 2024

Choose a reason for hiding this comment

mehdi-goli May 28, 2024

Choose a reason for hiding this comment

aacostadiaz May 28, 2024

Choose a reason for hiding this comment

mehdi-goli May 28, 2024 •

edited

Loading