SYCL: Migrate away from deprecated ggml_tensor->backend #10840

qnixsynapse · 2024-12-15T14:09:40Z

This should have been done way earlier as ggml_tensor->backend has been deprecated for a very long time.

There are some doubts in this, and thus have added comments which I will remove after discussing with the collaborators.
So far, backend test ops (for single GPU) are passing with this change.

cc: @airMeng @NeoZhangJianyu @abhilash1910 @Rbiessy

Also, integrated with GGML_LOG for debug logs and remove backend specific logging system.
With new log, it will be better to debug, for example:

call ggml_sycl_rms_norm                         // LLAMA model's attention part, currently uses eager attention
call ggml_sycl_rms_norm done
call ggml_sycl_mul
call ggml_sycl_mul done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_rope                                   
call ggml_sycl_rope done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_rope                                   
[SYCL]: call ggml_sycl_mul_mat
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_soft_max
call ggml_sycl_soft_max done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_dup
call ggml_sycl_dup done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_add
call ggml_sycl_add done
------------------------------------------------------
call ggml_sycl_rms_norm                                                          // LLAMA Feed forward MLP SWIGLU
call ggml_sycl_rms_norm done
call ggml_sycl_mul
call ggml_sycl_mul done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_silu
call ggml_sycl_silu done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_mul
call ggml_sycl_mul done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_add
call ggml_sycl_add done

Wish there was a level to control the verbosity using environmental variables of the debug log levels rather than passing command line arguments. This is reverted for now..

Rbiessy · 2024-12-17T14:25:28Z

Thanks for the PR. FYI most of us are on holiday so we may not be able to review until next month. If this is not urgent please give us some time to review it.

abhilash1910 · 2024-12-17T18:03:55Z

ggml/src/ggml-sycl/ggml-sycl.cpp

@@ -146,27 +145,11 @@ void ggml_backend_sycl_print_sycl_devices() {
    }
 }

-static inline int get_sycl_env(const char *env_name, int default_val) {


If we are removing this, is there any other addition where sycl env is called for logging?

Same comment.
I suggest keeping this function.

According to PR #9709 , the backend specific logging system was supposed to be replaced with a common logging system. Although I agree that this will enable all debug logs if --log-verbose cmdline argument is passed or enabled by default in test-backend-ops.

cc @slaren I think best here is to enable them only when something like GGML_BACKEND_DEBUG=1 is set in environment.

Edit: I am restoring GGML_SYCL_DEBUG implementation for now.

Sounds good. for platform benchmark purposes it makes sense to have the platform specific debug on.

NeoZhangJianyu

Thank your for the improvement.
But I don't think it bring benefit to SYCL backend user:
1.
GGML_SYCL_DEBUG is used to debug the performance and error in SYCL backend only.
It can be opened by environment in user running time without rebuild the source code.
Looks like GGML_LOG_DEBUG need to rebuild the source code after define macro.
It limits the online trouble shooting.

The GGML_SYCL_DEBUG function is more powerful than GGML_LOG_DEBUG.
We can't reduce the function for unify the code.
Customer should be first. :)

Replace it by common GGML_LOG_DEBUG log function, will mix the logs of common code and SYCL backend.
In performance test, there will be more log info.

This reverts commit 2607b7d. Let's keep the current SYCL specific logging mechanism for now

NeoZhangJianyu · 2024-12-18T02:54:29Z

ggml/src/ggml-sycl/common.hpp

@@ -163,8 +161,7 @@ inline dpct::err0 ggml_sycl_set_device(const int device) try {
  int current_device_id;
  SYCL_CHECK(CHECK_TRY_ERROR(current_device_id = get_current_device_id()));

-  // GGML_SYCL_DEBUG("ggml_sycl_set_device device_id=%d,
-  // current_device_id=%d\n", device, current_device);
+  GGML_LOG_DEBUG("ggml_sycl_set_device device_id=%d,current_device_id=%d\n", device, current_device_id);


no need to restore code.
This log will appear more times during inference,
Suggest remark it as default.

NeoZhangJianyu · 2024-12-18T02:56:35Z

ggml/src/ggml-sycl/ggml-sycl.cpp

-            id = get_current_device_id()));
-        // GGML_SYCL_DEBUG("current device index %d\n", id);
-        src_ptr = (char *) extra->data_device[id];
+    } else if (ggml_backend_buffer_is_sycl(src->buffer) || ggml_backend_buffer_is_sycl_split(src->buffer)) {


Use string compare to replace int compare, it will reduce the performance.
Is it necessary?

Agree with this.

As you already know that the previous backend path which used int is deprecated for a long time:

llama.cpp/ggml/include/ggml.h

Line 590 in 152610e

GGML_DEPRECATED(enum ggml_backend_type backend, "use the buffer type to find the storage location of the tensor");

I reused the already existed buffer implementation.
Also, I didn't notice any noticible slowdowns. If you do, please share the results.

Rbiessy

I have no concern with the changes here but I'll let @NeoZhangJianyu merge it if he's happy with the recent changes.

qnixsynapse · 2024-12-20T10:25:24Z

@Rbiessy Thank you. If only I could get the result of test on a multi GPU/RPC setup before merging, it would be nice since I couldn't able to do that due to lack of hardware.

Rbiessy · 2024-12-20T10:28:13Z

We also don't test multi-GPU or RPC setup at Codeplay so far. I tested this with a few models on A100 and found no regression.

qnixsynapse added 5 commits December 15, 2024 11:45

Migrate to tensor->buffer for checking backend buffer type: 1

35bff17

SYCL: common.cpp try to migrate away from tensor->backend

da40c42

SYCL: fix assertions and add proper comments

f8603b0

SYCL: remove extra space

0662a86

SYCL: Add back static to ggml_backend_buffer_is_sycl_split function

5ed4403

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Dec 15, 2024

qnixsynapse marked this pull request as draft December 15, 2024 14:10

qnixsynapse added 2 commits December 15, 2024 21:02

SYCL: Add pragma directive to suppress warning spam

19ce4b6

SYCL: Integrate debug logs with GGML_LOG and other fixes

2607b7d

qnixsynapse changed the title ~~SYCL: Migrate away from deprecated ggml_tensor->backend and use ggml_tensor->buffer for checking buffer type~~ SYCL: Migrate away from deprecated ggml_tensor->backend & ggml debug log integration Dec 16, 2024

qnixsynapse marked this pull request as ready for review December 16, 2024 06:17

slaren approved these changes Dec 17, 2024

View reviewed changes

abhilash1910 reviewed Dec 17, 2024

View reviewed changes

NeoZhangJianyu reviewed Dec 18, 2024

View reviewed changes

qnixsynapse added 3 commits December 18, 2024 09:11

Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes"

eeb0475

This reverts commit 2607b7d. Let's keep the current SYCL specific logging mechanism for now

SYCL: Use GGML_SYCL_DEBUG after reverting

82ce602

SYCL: reg_get_proc_address func, update to the current func signature

a20dde3

qnixsynapse changed the title ~~SYCL: Migrate away from deprecated ggml_tensor->backend & ggml debug log integration~~ SYCL: Migrate away from deprecated ggml_tensor->backend Dec 18, 2024

NeoZhangJianyu approved these changes Dec 18, 2024

View reviewed changes

SYCL: Refactor SYCL buffer checks in ggml_sycl_cpy_tensor_2d

6be041a

Rbiessy approved these changes Dec 20, 2024

View reviewed changes

NeoZhangJianyu merged commit eb5c3dc into ggerganov:master Dec 20, 2024
48 checks passed

qnixsynapse deleted the buffer_migrate branch December 21, 2024 04:50

rgerganov mentioned this pull request Dec 21, 2024

rpc-server : add support for the SYCL backend #10934

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYCL: Migrate away from deprecated ggml_tensor->backend #10840

SYCL: Migrate away from deprecated ggml_tensor->backend #10840

qnixsynapse commented Dec 15, 2024 •

edited

Loading

Rbiessy commented Dec 17, 2024

abhilash1910 Dec 17, 2024

NeoZhangJianyu Dec 18, 2024

qnixsynapse Dec 18, 2024 •

edited

Loading

abhilash1910 Dec 18, 2024

NeoZhangJianyu left a comment •

edited

Loading

NeoZhangJianyu Dec 18, 2024

NeoZhangJianyu Dec 18, 2024

abhilash1910 Dec 18, 2024

qnixsynapse Dec 18, 2024 •

edited

Loading

Rbiessy left a comment

qnixsynapse commented Dec 20, 2024

Rbiessy commented Dec 20, 2024

SYCL: Migrate away from deprecated ggml_tensor->backend #10840

SYCL: Migrate away from deprecated ggml_tensor->backend #10840

Conversation

qnixsynapse commented Dec 15, 2024 • edited Loading

Rbiessy commented Dec 17, 2024

abhilash1910 Dec 17, 2024

Choose a reason for hiding this comment

NeoZhangJianyu Dec 18, 2024

Choose a reason for hiding this comment

qnixsynapse Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

abhilash1910 Dec 18, 2024

Choose a reason for hiding this comment

NeoZhangJianyu left a comment • edited Loading

Choose a reason for hiding this comment

NeoZhangJianyu Dec 18, 2024

Choose a reason for hiding this comment

NeoZhangJianyu Dec 18, 2024

Choose a reason for hiding this comment

abhilash1910 Dec 18, 2024

Choose a reason for hiding this comment

qnixsynapse Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Rbiessy left a comment

Choose a reason for hiding this comment

qnixsynapse commented Dec 20, 2024

Rbiessy commented Dec 20, 2024

qnixsynapse commented Dec 15, 2024 •

edited

Loading

qnixsynapse Dec 18, 2024 •

edited

Loading

NeoZhangJianyu left a comment •

edited

Loading

qnixsynapse Dec 18, 2024 •

edited

Loading