Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL: Migrate away from deprecated ggml_tensor->backend #10840

Merged
merged 11 commits into from
Dec 20, 2024

Conversation

qnixsynapse
Copy link
Contributor

@qnixsynapse qnixsynapse commented Dec 15, 2024

This should have been done way earlier as ggml_tensor->backend has been deprecated for a very long time.

There are some doubts in this, and thus have added comments which I will remove after discussing with the collaborators.
So far, backend test ops (for single GPU) are passing with this change.

cc: @airMeng @NeoZhangJianyu @abhilash1910 @Rbiessy

Also, integrated with GGML_LOG for debug logs and remove backend specific logging system.
With new log, it will be better to debug, for example:

call ggml_sycl_rms_norm                         // LLAMA model's attention part, currently uses eager attention
call ggml_sycl_rms_norm done
call ggml_sycl_mul
call ggml_sycl_mul done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_rope                                   
call ggml_sycl_rope done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_rope                                   
[SYCL]: call ggml_sycl_mul_mat
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_soft_max
call ggml_sycl_soft_max done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_dup
call ggml_sycl_dup done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_add
call ggml_sycl_add done
------------------------------------------------------
call ggml_sycl_rms_norm                                                          // LLAMA Feed forward MLP SWIGLU
call ggml_sycl_rms_norm done
call ggml_sycl_mul
call ggml_sycl_mul done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_silu
call ggml_sycl_silu done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_mul
call ggml_sycl_mul done
[SYCL]: call ggml_sycl_mul_mat
call ggml_sycl_add
call ggml_sycl_add done

Wish there was a level to control the verbosity using environmental variables of the debug log levels rather than passing command line arguments. This is reverted for now..

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Dec 15, 2024
@qnixsynapse qnixsynapse marked this pull request as draft December 15, 2024 14:10
@qnixsynapse qnixsynapse changed the title SYCL: Migrate away from deprecated ggml_tensor->backend and use ggml_tensor->buffer for checking buffer type SYCL: Migrate away from deprecated ggml_tensor->backend & ggml debug log integration Dec 16, 2024
@qnixsynapse qnixsynapse marked this pull request as ready for review December 16, 2024 06:17
@Rbiessy
Copy link
Collaborator

Rbiessy commented Dec 17, 2024

Thanks for the PR. FYI most of us are on holiday so we may not be able to review until next month. If this is not urgent please give us some time to review it.

@@ -146,27 +145,11 @@ void ggml_backend_sycl_print_sycl_devices() {
}
}

static inline int get_sycl_env(const char *env_name, int default_val) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are removing this, is there any other addition where sycl env is called for logging?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment.
I suggest keeping this function.

Copy link
Contributor Author

@qnixsynapse qnixsynapse Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to PR #9709 , the backend specific logging system was supposed to be replaced with a common logging system. Although I agree that this will enable all debug logs if --log-verbose cmdline argument is passed or enabled by default in test-backend-ops.

cc @slaren I think best here is to enable them only when something like GGML_BACKEND_DEBUG=1 is set in environment.

Edit: I am restoring GGML_SYCL_DEBUG implementation for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. for platform benchmark purposes it makes sense to have the platform specific debug on.

Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank your for the improvement.
But I don't think it bring benefit to SYCL backend user:
1.
GGML_SYCL_DEBUG is used to debug the performance and error in SYCL backend only.
It can be opened by environment in user running time without rebuild the source code.
Looks like GGML_LOG_DEBUG need to rebuild the source code after define macro.
It limits the online trouble shooting.

The GGML_SYCL_DEBUG function is more powerful than GGML_LOG_DEBUG.
We can't reduce the function for unify the code.
Customer should be first. :)

Replace it by common GGML_LOG_DEBUG log function, will mix the logs of common code and SYCL backend.
In performance test, there will be more log info.

@qnixsynapse qnixsynapse changed the title SYCL: Migrate away from deprecated ggml_tensor->backend & ggml debug log integration SYCL: Migrate away from deprecated ggml_tensor->backend Dec 18, 2024
@@ -163,8 +161,7 @@ inline dpct::err0 ggml_sycl_set_device(const int device) try {
int current_device_id;
SYCL_CHECK(CHECK_TRY_ERROR(current_device_id = get_current_device_id()));

// GGML_SYCL_DEBUG("ggml_sycl_set_device device_id=%d,
// current_device_id=%d\n", device, current_device);
GGML_LOG_DEBUG("ggml_sycl_set_device device_id=%d,current_device_id=%d\n", device, current_device_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to restore code.
This log will appear more times during inference,
Suggest remark it as default.

id = get_current_device_id()));
// GGML_SYCL_DEBUG("current device index %d\n", id);
src_ptr = (char *) extra->data_device[id];
} else if (ggml_backend_buffer_is_sycl(src->buffer) || ggml_backend_buffer_is_sycl_split(src->buffer)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use string compare to replace int compare, it will reduce the performance.
Is it necessary?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this.

Copy link
Contributor Author

@qnixsynapse qnixsynapse Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you already know that the previous backend path which used int is deprecated for a long time:

GGML_DEPRECATED(enum ggml_backend_type backend, "use the buffer type to find the storage location of the tensor");

I reused the already existed buffer implementation.
Also, I didn't notice any noticible slowdowns. If you do, please share the results.

Copy link
Collaborator

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no concern with the changes here but I'll let @NeoZhangJianyu merge it if he's happy with the recent changes.

@qnixsynapse
Copy link
Contributor Author

@Rbiessy Thank you. If only I could get the result of test on a multi GPU/RPC setup before merging, it would be nice since I couldn't able to do that due to lack of hardware.

@Rbiessy
Copy link
Collaborator

Rbiessy commented Dec 20, 2024

We also don't test multi-GPU or RPC setup at Codeplay so far. I tested this with a few models on A100 and found no regression.

@NeoZhangJianyu NeoZhangJianyu merged commit eb5c3dc into ggerganov:master Dec 20, 2024
48 checks passed
@qnixsynapse qnixsynapse deleted the buffer_migrate branch December 21, 2024 04:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants