-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Update design doc for copy queue #362
base: sycl
Are you sure you want to change the base?
Conversation
7739b83
to
d5773eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicked the design doc but LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an E2E test for a linear graph that interleaves memory and kernel commands. I'd like to verify that the interaction between our in-order command-list optimization and compute/copy command-lists works correctly.
I have added a test locally but actually the way it is right now the copy command list optimization will only happen for out-of-order command buffers. It might make sense to leave that as it is for now and work on combining these two optimizations if worth it as a further piece of work? |
73ba45c
to
4a47e54
Compare
eb67857
to
020a5ff
Compare
…4143) When passing along multiple targets in the form of -fsycl-targets=intel_gpu_dg1,intel_gpu_pvc, the number of the device compilations was n*n as opposed to just n. Due to how we were handling duplicate entries for toolchain generation, the different names used even though they had the same target triple (spir64_gen) we being considered as unique, causing the multiple entries. This is the second attempt to push this one in, updated the sycl-offload-new-driver.c test to reflect ordering issues encountered.
…l targets (intel#14102) clang-linker-wrapper is not target-specific. i.e. it is not called for a single target device. It is called only once. Currently, clang-linker-wrapper is called only with device images with spir64 targets. So, the existing approach to capture the first target triple in the list of triples and use it for gathering sycl-device-library files is valid. As we plan to add support for more targets (AOT), we need to gather sycl-device-libraries for all targets. This PR addresses this change. Also, the triple should not be passed to the linker wrapper. The linker wrapper should get the triples from device images. Thanks --------- Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: jinge90 <[email protected]>
) For testing oneapi-src/unified-runtime#1282 --------- Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
…ntel#14138) Required for specific use-cases in SYCLomatic. --------- Signed-off-by: Alberto Cabrera <[email protected]>
The `Graph/UnsupportedDevice/device_query.cpp` test asserts that L0 devices will never have full graph support. This is not the case, depending on the L0 device and driver version full graphs support is possible. Update the test to remove asserting on this, as diving into these details is out of the scope of the test. This was previously decided when discussion how to check the OpenCL backend for similar possible variances in aspect support.
`cuda_dev_kit` is not set properly in [test-e2e/lit.cfg.py](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/lit.cfg.py) due to invalid CUDA paths. Fixing the paths showed errors in [14115](intel#14115) and [14116](intel#14116) which are XFAILed. The patch fixes the failure of [cuda_queue_priority.cpp](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Plugin/cuda_queue_priority.cpp) on Windows / CUDA.
oneapi-src/unified-runtime#1128 --------- Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
This PR adds math `extend_v*4` operators (18 in total) along with unit-tests for signed and unsigned int32 cases. *Some changes overlap with the previous `extend_v*2` PR intel#13953 and thus should be reviewed/merged first. --------- Co-authored-by: Alberto Cabrera Pérez <[email protected]> Co-authored-by: Joe Todd <[email protected]> Co-authored-by: Yihan Wang <[email protected]>
…l#13807) It was propagated to `getOrCreateAllocaForReq` when creating a new record, but then no commands are expected to be enqueued there since the first alloca for a record cannot exceed its leaf limit or be linked to another alloca.
Running the test on Windows failed due to missing support of `ls`. Replacing `ls` with `cat` made the test pass on Windows.
…l#14120) - `InorderQueue/in_order_get_property.cpp` -> Use non-deprecated `sycl::exception`, add check for errc to ensure we are still catching the correct exception - `InorderQueue/in_order_kernels.cpp` -> Use group `get_group_id` function instead of deprecated `get_id` - `InorderQueue/in_order_usm_implicit.cpp` -> Use queue `mem_advice` function that uses `int` instead of `pi_mem_advice`
Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
oneapi-src/unified-runtime#1711 --------- Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
Currently Level Zero plugin uses loader and headers fetched by the Level Zero adapter (LevelZeroLoader-Headers, LevelZeroLoader targets). Currently downloaded loader code is not used, only headers are used for xpti. So, get headers location from LevelZeroLoader-Headers target instead and remove unnecessary code.
Add group_key_value_sorter sorters and sort_key_value_over_group APIs based on https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_group_sort.asciidoc extension. This PR was split out from larger PR: intel#13713 Co-authored-by: "Andrei Fedorov [[email protected]](mailto:[email protected])" Co-authored-by: "Romanov Vlad [[email protected]](mailto:[email protected])"
…ntel#14129) A single basic file to compile and run to test functionality of --offload-new-driver --------- Co-authored-by: Marcos Maronas <[email protected]>
…d ConvertFToBFloat16INTEL (intel#14085) This PR adds vector overloads of `ConvertBFloat16ToFINTEL` and `ConvertFToBFloat16INTEL` to libdevice (SPEC: https://spec.oneapi.io/level-zero/latest/core/SPIRV.html#bfloat16-conversions) and a wrapper around it (`BF16VecToFloatVec` and `FloatVecToBF16Vec`) in `ext::oneapi::detail`. These overloads are intended to optimize BFloat16 `marray`, `vec` operations, for which we currently do element-by-element `bfloat16 -> float -> bfloat16` conversions.
…4130) Replaces intel#13270 Changing the storage to std::array instead of Clang's extension fixes strict ansi-aliasing violation and simplifies device code.
[SYCL] Adding support for missing math ops: - truncf - sinpif - rsqrtf - exp10f
Add section to the contribution guide detailing the current process for integrating Unified Runtime updates into DPC++.
…ntel#14123) Current implementation of profiling info for NOP barriers is inconsistent with other events from the same queue (e.g., if the previous event started after the barrier was submitted). To make them consistent while keeping the optimization, we would need to duplicate the event on our side and make the duplicate check and potentially use profiling info of its previous event. Instead, as the first step, disable the NOP optimization during profiling since profiling is known to incur a performance hit anyway. The proper duplicate event approach can be implemented as a follow up if this causes issues for users. Partially reverts intel#12949
atomic_ref<T *> uses 64-bit atomics and it should be decorated with the corresponding aspect. fixes: intel#12743
I'm observing cache overflow when running heavy tests on OCL backend with gpu. Clear cache in case of PI_ERROR_OUT_OF_HOST_MEMORY as well as for PI_ERROR_OUT_OF_RESOURCES. Using as reference: intel#11987
Scheduled igc dev drivers uplift Co-authored-by: GitHub Actions <[email protected]>
The function is using the `operator=` before it's defined which can cause some build failures: ``` build/include/sycl/ext/oneapi/bfloat16.hpp:98:19: error: no match for ‘operator=’ (operand types are ‘sycl::_V1::ext::oneapi::bfloat16’ and ‘float’) 98 | dst[i] = src[i]; | ^ ``` Moving it after the bfloat16 class definition fixes it.
Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/micromatch/braces/commit/74b2db2938fad48a2ea54a9c8bf27a37a62c350d"><code>74b2db2</code></a> 3.0.3</li> <li><a href="https://github.com/micromatch/braces/commit/88f1429a0f47e1dd3813de35211fc97ffda27f9e"><code>88f1429</code></a> update eslint. lint, fix unit tests.</li> <li><a href="https://github.com/micromatch/braces/commit/415d660c3002d1ab7e63dbf490c9851da80596ff"><code>415d660</code></a> Snyk js braces 6838727 (<a href="https://redirect.github.com/micromatch/braces/issues/40">#40</a>)</li> <li><a href="https://github.com/micromatch/braces/commit/190510f79db1adf21d92798b0bb6fccc1f72c9d6"><code>190510f</code></a> fix tests, skip 1 test in test/braces.expand</li> <li><a href="https://github.com/micromatch/braces/commit/716eb9f12d820b145a831ad678618731927e8856"><code>716eb9f</code></a> readme bump</li> <li><a href="https://github.com/micromatch/braces/commit/a5851e57f45c3431a94d83fc565754bc10f5bbc3"><code>a5851e5</code></a> Merge pull request <a href="https://redirect.github.com/micromatch/braces/issues/37">#37</a> from coderaiser/fix/vulnerability</li> <li><a href="https://github.com/micromatch/braces/commit/2092bd1fb108d2c59bd62e243b70ad98db961538"><code>2092bd1</code></a> feature: braces: add maxSymbols (<a href="https://github.com/micromatch/braces/issues/">https://github.com/micromatch/braces/issues/</a>...</li> <li><a href="https://github.com/micromatch/braces/commit/9f5b4cf47329351bcb64287223ffb6ecc9a5e6d3"><code>9f5b4cf</code></a> fix: vulnerability (<a href="https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727">https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727</a>)</li> <li><a href="https://github.com/micromatch/braces/commit/98414f9f1fabe021736e26836d8306d5de747e0d"><code>98414f9</code></a> remove funding file</li> <li><a href="https://github.com/micromatch/braces/commit/665ab5d561c017a38ba7aafd92cc6655b91d8c14"><code>665ab5d</code></a> update keepEscaping doc (<a href="https://redirect.github.com/micromatch/braces/issues/27">#27</a>)</li> <li>Additional commits viewable in <a href="https://github.com/micromatch/braces/compare/3.0.2...3.0.3">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=braces&package-manager=npm_and_yarn&previous-version=3.0.2&new-version=3.0.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/intel/llvm/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…tly (intel#12872) This PR refactors the builtin fence helper macro for AMDGPU to take in and process the order semantic explicitly because that is the only semantic argument accepted by the amdgcn builtin. Additionally, makes the `None` (Monotonic) order semantic which maps to C++/SYCL's `relaxed` to be a no-op instead of falling back to the previous `acq_rel` default order. --------- Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
The rotate functions are technically c++20 and MSVC hasn't implemented them yet. Signed-off-by: Sarnie, Nick <[email protected]>
…ntel#14151) One of the models that is used for specifying the device architecture for spir64_gen is to use the -Xsycl-target-backend "-device arg" syntax on the command line. Hook up the ability to scan the target backend values to embed the proper information in the packaged binary when using the new offload model.
This PR adds math `extend_vcompare[2/4] `operators (4 in total) along with unit-tests for signed and unsigned int32 cases. Also, Unit-tests from previous `extend_v*4` intel#14078 and `extend_v*2` intel#13953 are moved to two different files. --------- Co-authored-by: Alberto Cabrera Pérez <[email protected]> Co-authored-by: Joe Todd <[email protected]> Co-authored-by: Yihan Wang <[email protected]>
020a5ff
to
0d13b58
Compare
…ntel#14150) pre-commit PR for oneapi-src/unified-runtime#1749 --------- Signed-off-by: Neil R. Spruit <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
intel#14162) Instead of using old device selector objects, use SYCL 2020 device selector callables to construct devices in `FilterSelector` e2e tests.
- Update UR tag to include L0 command-buffer copy engine optimization - Add test which mixes copy and kernel commands - Update design doc to detail copy engine optimization
Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
5d44bdf
to
01b1582
Compare
Update the design doc as command-buffer now uses the copy queue if a copy engine is available.