Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sycl: always set the main device after initialization #7909

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 7 additions & 11 deletions ggml-sycl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3307,15 +3307,18 @@ class sycl_gpu_mgr {

void detect_sycl_gpu_list_with_max_cu() try {
int device_count = dpct::dev_mgr::instance().device_count();
sycl::backend backend;

for (int id = 0; id < device_count; id++) {
sycl::device device = dpct::dev_mgr::instance().get_device(id);
if (!device.is_gpu())
continue;
dpct::device_info prop;
dpct::get_device_info(prop, device);
if (max_compute_units < prop.get_max_compute_units())
if (max_compute_units < prop.get_max_compute_units()) {
max_compute_units = prop.get_max_compute_units();
backend = device.get_backend();
}
}

for (int id = 0; id < device_count; id++) {
Expand All @@ -3325,7 +3328,7 @@ class sycl_gpu_mgr {
dpct::device_info prop;
dpct::get_device_info(prop, device);
if (max_compute_units == prop.get_max_compute_units() &&
is_ext_oneapi_device(device)) {
backend == device.get_backend()) {
bashbaug marked this conversation as resolved.
Show resolved Hide resolved
gpus.push_back(id);
devices.push_back(device);
work_group_size = prop.get_max_work_group_size();
Expand Down Expand Up @@ -3357,15 +3360,6 @@ class sycl_gpu_mgr {
}
GGML_ASSERT(false);
}

bool is_ext_oneapi_device(const sycl::device &dev) {
sycl::backend dev_backend = dev.get_backend();
if (dev_backend == sycl::backend::ext_oneapi_level_zero ||
dev_backend == sycl::backend::ext_oneapi_cuda ||
dev_backend == sycl::backend::ext_oneapi_hip)
return true;
return false;
}
};

static sycl_gpu_mgr *g_sycl_gpu_mgr = NULL;
Expand Down Expand Up @@ -17400,6 +17394,7 @@ GGML_API GGML_CALL void ggml_backend_sycl_set_single_device_mode(int main_gpu_id
g_sycl_gpu_mgr = new sycl_gpu_mgr(main_gpu_id);
g_ggml_sycl_backend_gpu_mode = SYCL_SINGLE_GPU_MODE;
ggml_init_by_gpus(g_sycl_gpu_mgr->get_gpu_count());
ggml_sycl_set_main_device(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In single mode, the main device ID is set by the parameter of cmd line.
So, set it as 0, will disable the parameter: --main-gpu in fact.
So rm it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about ggml_sycl_set_main_device(main_gpu_id)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused by this initially also, but I think zero is the only safe and correct initial value. Here's why:

There are two sets of devices we can get and iterate through: The first is the set of devices returned by dpct::dev_mgr::instance().get_device(). This is the set of all devices in the system, and main_gpu_id is an index into this set. The second is the set of devices stored in sycl_gpu_mgr. This is essentially a "filtered" set of devices we've chosen to use, and it can be indexed from zero to sycl_gpu_mgr->get_gpu_count().

In the case where we choose a main GPU on the command line, the filtering will be performed when we create the sycl_gpu_mgr above.

g_sycl_gpu_mgr = new sycl_gpu_mgr(main_gpu_id);

After the filtering occurs, the only valid index to pass to ggml_sycl_set_main_device() is index zero, because there is only one device in the sycl_gpu_mgr.

g_ggml_backend_sycl_buffer_type_initialized = false;
}

Expand All @@ -17419,6 +17414,7 @@ GGML_API GGML_CALL void ggml_backend_sycl_set_mul_device_mode() {
g_sycl_gpu_mgr = new sycl_gpu_mgr();
g_ggml_sycl_backend_gpu_mode = SYCL_MUL_GPU_MODE;
ggml_init_by_gpus(g_sycl_gpu_mgr->get_gpu_count());
ggml_sycl_set_main_device(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this breaks multi-GPU semantics, @NeoZhangJianyu can you try this on a multi-GPU env?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I convinced myself that this would be OK even in a multi-GPU environment, though admittedly I haven't tested this myself so it'd be great to confirm this works.

My thinking is: we're eventually going to set the main device via some other codepath, such as via ggml_backend_sycl_init (probably through llama_new_context_with_model). We may even set the main device multiple times. This is all fine; we just need some valid initial value, so if we happen to lookup the SYCL queue and hence the SYCL context, say to allocate host USM when loading a model, we have a valid value to perform the lookup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In mulitple mode, set main gpu is not needed. #0 gpu is always default main gpu.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this isn't the case:

static int g_main_device = -1;

We could change the initial value of g_main_device from -1 to 0, but we'd probably also want to change some other initial values to stay in sync, say for g_main_device_id. It seems safer to me to just call ggml_sycl_set_main_device(0) instead, but let me know what you prefer.

g_ggml_backend_sycl_buffer_type_initialized = false;
}

Expand Down
Loading