-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Update spec supported features #338
Conversation
the Explicit API. Empty nodes can be used instead of barriers when a user is | ||
building a graph with the explicit API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently if users using the Explicit API want to implement a barrier that waits for all previous nodes to complete, they have to create an empty node that explicitly depends on all previous nodes. Does it make sense to change the definition of empty nodes such that if users add an empty node with no dependencies, this node will automatically take all previous nodes as dependencies (general barrier)? Or, if we want to keep a empty node without dependency (not sure why should be necessary?) to propose a wildcard (shortcut) to add all previous nodes as dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That idea makes sense to me 👍 keeping the existing empty node with no dependency semantics doesn't really add any value to the user, but making it depend on previous leaf nodes corresponds to barrier semantics to is useful and better aligns with the message of an alternative to the barrier extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like having that automatic creation of dependencies on the explicit API is a little bit counter to the existing behaviour where the explicit API doesn't do things like this in the background. This perhaps makes it a bit unintuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @Bensuo here. However, we might want to add a barrier shortcut if this is frequently used in applications.
. Using reductions in a graph node. | ||
. Using sycl streams in a graph node. | ||
. Using a kernel bundle in a graph node. | ||
. Profiling an event returned from graph submission with | ||
`event::get_profiling_info()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should also add a bullet point here about passing the no immediate command-list property on queue creation to workaround current issue with immediate command-lists.
This commit removes the typed pointer support from the LaunchFunc's lowering to Vukan dialect. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502
Recent versions of GNU binutils starting from 2.39 support symbol+offset lookup in addition to the usual numeric address lookup. This change adds symbol lookup to llvm-symbolize and llvm-addr2line. Now llvm-symbolize behaves closer to GNU addr2line, - if the value specified as address in command line or input stream is not a number, it is treated as a symbol name. For example: llvm-symbolize --obj=abc.so func_22 llvm-symbolize --obj=abc.so "CODE func_22" This lookup is now supported only for functions. Specification with offset is not supported yet. This is a recommit of 2b27948, reverted in 39fec54 because the test llvm/test/Support/interrupts.test started failing on Windows. The test was changed in 18f036d and is also updated in this commit. Differential Revision: https://reviews.llvm.org/D149759
…iku (#70434) Same as 12b87f6 and the addition to Gnu.
This patch moves `RecordDecl::ArgPassingKind` to DeclBase.h to namespace scope, so that it's complete at the time bit-field is declared.
This commit removes the support for lowering GPU to ROCDL dialect with typed pointers. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502
This patch adds missing dependencies required by the new unittest introduced by #68406.
This patch moves `OMPDeclareReductionDecl::InitKind` to DeclBase.h, so that it's complete at the point where corresponding bit-field is declared. This patch also converts it to scoped enum named `OMPDeclareReductionInitKind`
This adds a writable attribute, which in conjunction with dereferenceable(N) states that a spurious store of N bytes is introduced on function entry. This implies that this many bytes are writable without trapping or introducing data races. See https://llvm.org/docs/Atomics.html#optimization-outside-atomic for why the second point is important. This attribute can be added to sret arguments. I believe Rust will also be able to use it for by-value (moved) arguments. Rust likely won't be able to use it for &mut arguments (tree borrows does not appear to allow spurious stores). In this patch the new attribute is only used by LICM scalar promotion. However, the actual motivation for this is to fix a correctness issue in call slot optimization, which needs this attribute to avoid optimization regressions. Followup to the discussion on D157499. Differential Revision: https://reviews.llvm.org/D158081
…or is not declared in the base class Fixes #70464 When ctor is not declared in the base class, initializing the base class with the initializer list will not trigger a proper assignment of the base region, as a CXXConstructExpr doing that is not available in the AST. This patch checks whether the init expr is an InitListExpr under a base initializer, and adds a binding if so.
…9945)" This reverts commit 5bfd89b. It was causing build failures on ffmpeg on i686.
Progressive support of fastmath flag in the conversion of log type ops. See more detail https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981
Adds additional test coverage for Issue #68466
This is intended as the replacement for ConstantExpr::getIntegerCast(), which does not require availability of the corresponding constant expressions. It just forwards to ConstantFoldCastOperand with the correct opcode.
This always works on a constant integer or integer splat, so the constant fold here should always succeed.
Use ConstantFoldIntegerCast() instead, to remove the reliance on constant expressions.
…rted to other translation unit. (#68774) Fixes: #68769 Co-authored-by: miaozhiyuan <[email protected]>
…SS/PACKUS truncateVectorWithPACK handling of sub-128-bit result types was improved some time ago, so remove the old 64-bit limit Fixes #68466
This patch moves `ObjCMethodDecl::ImplementationControl` to a DeclBase.h so that it's complete at the point where corresponsing bit-field is declared. This patch also converts it to a scoped enum `clang::ObjCImplementationControl`.
SME2 is documented as part of the main SME supplement: https://developer.arm.com/documentation/ddi0616/latest/ The one change for debug is this new ZT0 register. This register contains data to be used with new table lookup instructions. It's size is always 512 bits (not scalable) and can be interpreted in many different ways depending on the instructions that use it. The kernel has implemented this as a new register set containing this single register. It always returns register data (with no header, unlike ZA which does have a header). https://docs.kernel.org/arch/arm64/sme.html ZT0 is only active when ZA is active (when SVCR.ZA is 1). In the inactive state the kernel returns 0s for its contents. Therefore lldb doesn't need to create 0s like it does for ZA. However, we will skip restoring the value of ZT0 if we know that ZA is inactive. As writing to an inactive ZT0 sets SVCR.ZA to 1, which is not desireable as it would activate ZA also. Whether SVCR.ZA is set will be determined only by the ZA data we restore. Due to this, I've added a new save/restore kind SME2. This is easier than accounting for the variable length ZA in the SME data. We'll only save an SME2 data block if ZA is active. If it's not we can get fresh 0s back from the kernel for ZT0 anyway so there's nothing for us to restore. This new register will only show up if the system has SME2 therefore the SME set presented to the user may change, and I've had to account for that in in a few places. I've referred to it internally as simply "ZT" as the kernel does in NT_ARM_ZT, but the architecture refers to the specific register as "ZT0" so that's what you'll see in lldb. ``` (lldb) register read -s 6 Scalable Matrix Extension Registers: svcr = 0x0000000000000000 svg = 0x0000000000000004 za = {0x00 <...> 0x00} zt0 = {0x00 <...> 0x00} ```
Use IRBuilder or ConstantFolding instead.
Looks like there's code out there that, instead of using '__attribute__((constructor(x)))' to add constructor functions, they just declare a global function pointer and use '__attribute__((section('.ctors')))' instead. Problem is, with memtag-globals, we pad the global function pointer to be 16 bytes large. This of course means we have an 8-byte real function pointer, then 8 bytes of zero padding, and this trips up the loader when it processes this section. Fixes #69939
Small fix to strip down additional CPU information from AMD query, like: `gfx90a:sramecc+:xnack-` to just `gfx90a`
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
…intel#11793) QA asked us to do this. --------- Signed-off-by: Sarnie, Nick <[email protected]>
…d joint_matrix_mad (intel#11738) This patch adds two new properties `joint_matrix` and `joint_matrix_mad` to device requirements in sycl-post-link. SYCL RT reads these properties and throws exception if objects of `joint_matrix` type or `joint_matrix_mad` functions are not supported by the current device. "Unsupported" means matrix type and sizes provided by user are not compatible with the list of all supported matrix types and sizes from the runtime query in `get_info<...matrix_combinations>`.
Combines the following L0 changes: * oneapi-src/unified-runtime#1033 * oneapi-src/unified-runtime#1028 * oneapi-src/unified-runtime#1022
Since bool vectors have a backing storage of chars, the unary minus operator was in fact producing `(char)-1`, and thus not adhering to the ABI where bools should be either 0 or 1. This could manifest itself in bugs, for instance where the "bool" elements wouldn't compare equal to other bools, such as those in arrays. This only manifested itself in device code (possibly because the array of bools is also an array of bytes under the hood), hence the test case that differs in style somewhat from the rest.
Old version is temporary kept in SPV_INTEL_joint_matrix_legacy.asciidoc --------- Signed-off-by: Sidorov, Dmitry <[email protected]>
Since vectors of bools are backed by byte-sized storage, we must ensure that conversion results from other vector types are correctly brought into the expected range of bools, e.g., `(char)0` or `(char)1`.
…#11809) Previously, we committed intel#9642 aiming to break dependency on metadata order for check_has.cpp but the commit was broken unexpectedly. This PR re-lands it. --------- Signed-off-by: jinge90 <[email protected]>
- Enables specialization constants handling in SYCL-Graph extension. - Adds E2E tests that verify this behavior. - Removes unittests tests that checked for unsupported feature exception throwing. --------- Co-authored-by: Maxime France-Pillois <[email protected]>
Signed-off-by: Tikhomirova, Kseniya <[email protected]>
…el#11665) This patch also implements support for 8- and 16-bit data types in slm_block_load(). --------- Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
intel#11832) Cuda specifies the launch bounds as: ``` __launch_bounds__(maxThreadsPerBlock, minBlocksPerMultiprocessor, maxBlocksPerCluster) ``` making it impossible to specify `maxBlocksPerCluster` without the preceding two attributes (similarly for `minBlocksPerMultiprocessor`), issue warnings and ignore attributes if the condition is not met.
…1815) Fixes generating filters like this one: `DeviceName:{{gfx90a:sramecc\+:xnack\-}},DriverVersion:{{HIP 50422.80}}`
When using -O0 to disable optimizations, also set -fsycl-disable-range-rounding to further disable optimizations to improve debugability.
Unused variable - clean it up.
This started causing a hang on a newer GPU driver with O0, we have an internal tracker for this. Force O2 for now. Signed-off-by: Sarnie, Nick <[email protected]>
… op (intel#11847) Even though std::max is completely legal/correct to use in SYCL/ESIMD especially in 'constexpr' context, it may cause problems on Windows in some non-trivial configurations with some odd order of includes of system and SYCL header files. Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
intel#11850) The element-size address alignment is valid from correctness point of view, but using 1-byte and 2-byte alignment implicitly causes performance regression for block_load(const int8_t *, ...) and block_load(const int16_t *, ...) because GPU BE have to generate slower GATHER instead of more efficient BLOCK-LOAD. Without this fix block-load causes up to 44% performance slow-down on some apps that used block_load() with alignment assumptions used before block_load(usm, ..., compile_time_props) was implemented. The reasoning for the expected/assumed alignment from element-size to 4-bytes for byte- and word-vectors is such: The idea of block_load() call (opposing to gather() call) is to have efficient block-load, and thus the assumed alignment is such that allows to generate block-load. This is a bit more tricky for user but that is how block_load/store API always worked before: block-load had restrictions that needed to be honored. To be on safer side, user can always pass the guaranteed alignment. --------- Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
The following features defined in the specification as unsupported, have working implementations upstream. * intel#11418 * intel#11505 * intel#11556
0278898
to
b402905
Compare
The following features are defined in the specification as unsupported, but have working implementations upstream.