Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) cherry-pick b549a1bbefb2f1fbb8b558bac1f2ae7967e60964 #1

Merged
merged 2 commits into from
Jul 13, 2024

Conversation

arthw
Copy link
Owner

@arthw arthw commented Jul 13, 2024

[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (ggerganov#8266)
* fix group_norm ut

* split softmax

* fix softmax

* add concat support condition

* revert debug code

* move QK_WARP_SIZE to presets.hpp

Fix issue in above PR:
fix norm() nullptr lead to crash on iGPU.
use WARP_32_SIZE replace QK_WARP_SIZE
optimize dmmv.cpp for iGPU.
add sycl_hw.cpp to detect Hardware info.

[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (ggerganov#8266)
    * fix group_norm ut

    * split softmax

    * fix softmax

    * add concat support condition

    * revert debug code

    * move QK_WARP_SIZE to presets.hpp

Fix issue in above PR:
  fix norm() nullptr lead to crash on iGPU.
  use WARP_32_SIZE replace QK_WARP_SIZE
  optimize dmmv.cpp for iGPU.
  add sycl_hw.cpp to detect Hardware info.
@arthw arthw changed the title cherry-pick b549a1bbefb2f1fbb8b558bac1f2ae7967e60964 [SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) [SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) cherry-pick b549a1bbefb2f1fbb8b558bac1f2ae7967e60964 Jul 13, 2024
@arthw arthw merged commit aeaed61 into master Jul 13, 2024
53 checks passed
arthw pushed a commit that referenced this pull request Aug 7, 2024
* [example] batched-bench "segmentation fault"

When `llama-batched-bench` is invoked _without_ setting `-npl`, "number
of parallel prompts", it segfaults.

The segfault is caused by invoking `max_element()` on a zero-length
vector, `n_pl`

This commit addresses that by first checking to see if the number of
parallel prompts is zero, and if so sets the maximum sequence size to 1;
otherwise, sets it to the original, the result of `max_element()`.

Fixes, when running `lldb build/bin/llama-batched-bench -- -m models/Meta-Llama-3-8B.gguf`

```
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010000366c llama-batched-bench`main(argc=3, argv=0x000000016fdff268) at batched-bench.cpp:72:28
   69  	    llama_context_params ctx_params = llama_context_params_from_gpt_params(params);
   70
   71  	    // ensure enough sequences are available
-> 72  	    ctx_params.n_seq_max = *std::max_element(n_pl.begin(), n_pl.end());
```

* Update examples/batched-bench/batched-bench.cpp

Co-authored-by: compilade <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: compilade <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant