-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor[cartesian]: unexpanded sdfg cleanups #1843
Conversation
76c47ad
to
c7aba86
Compare
553196d
to
38df291
Compare
(rebased onto |
Re: My cursory look seems to point that this is only called at top level for We should log a task to double check and we need a methodological/code way to differentiate code path we expect is in stencil mode and what is in orchestrated mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Couple of question.
Do the enum
for the device type, the string is making me sad.
38df291
to
6ea0fe4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@havogt Care to review or ok to push? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to use the DeviceType enum from _core.definitions
instead of introducing a new enum. Can you check if that works?
class StorageDevice(Enum): | ||
CPU = 1 | ||
GPU = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to use the DeviceType
from _core.definitions
. I don't have time to check how ROCM
would fit here.
Maybe we can do
CurrentGPU = core_defs.DeviceType.ROCM if cp.cuda.runtime.is_hip else core_defs.DeviceType.CUDA
somewhere. cupy currently has a limitation that it is either installed for cuda or for rocm, so it would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let's reuse DeviceType
from the core definitions. I honestly don't know what our support level for AMD graphic cards is. From what I read in the code, it looks like that's gonna be an item on the tasklist at some point. So I personally wouldn't worry too much about ROCM
for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ECMWF runs AMD hardware - so we need to keep this alive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me to understand, which backend is compatible with AMD hardware? From what I read, the CudaBackend
as well as the DaceGPUBackend
, generate cuda code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So DaCe
can HIP - I believe that's where they are getting there AMD goodies. @stubbiali, can you talk to the AMD hardware use? Am I misremembering ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both GridTools and DaCe GPU backends support AMD graphics cards. We have been running GT4Py codes on LUMI for at least 1.5 years.
Forward debug info from gt4py to dace if we know it. If we don't know, just don't specify instead of setting `DebugInfo(0)`. Moved `get_dace_debuginfo()` one folder higher from expansion utils into "general" dace utils because its not only used in expansion.
Transients are added in OirSDFGBuilder, where no array lifetime is configured. After building that SDFG, the lifetime of all transients is manually set to `Persistent` (which is an optimization leading to less frequent memory allocation in case a kernel is called multiple times). In this commit we directly specify the transient's lifetime when building the SDFG.
For GPU targets, we have to configure the `storage_type` for transient arrays. In addition, we have to set the library node's `device` property. We can do both while building the SDFG instead of separate passes afterwards.
Putting this back to draft mode.
|
To be split from the other changes, probably in a follow-up PR.
6ea0fe4
to
4ad9458
Compare
## Description Refactors from (recent) debugging sessions around transient arrays in the "unexpanded SDFG" (the one with the library nodes): - Remove unused `**kwargs` from `OirSDFGBuilder` - Forward debugging information about transient arrays to DaCe - Use a (constant) variable for connector prefixes of data going into/out of the library nodes - Configure the lifetime of transient arrays directly in `OirSDFGBuilder` This is a follow-up from PR #1843. In this PR, we separate the DaCe backend cleanup from the refactor around (re-)using `DeviceType` instead of `"cpu" | "gpu"` string literals. ## Requirements - [x] All fixes and/or new features come with corresponding tests. Should be covered by existing tests. - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. N/A --------- Co-authored-by: Roman Cattaneo <> Co-authored-by: Roman Cattaneo <[email protected]> Co-authored-by: Florian Deconinck <[email protected]>
Description
Refactors from recent debugging sessions around transient arrays in the "unexpanded SDFG" (the one with the library nodes):
**kwargs
fromOirSDFGBuilder
OirSDFGBuilder
OirSDFGBuilder
(for GPU targets)LibraryNode
's device type directly inOirSDFGBuilder
(for GPU targets)StorageDevice
inLayoutInfo
Sidenote on the allocation lifetime: In the orchestrated code path, we reset the allocation lifetime of transients to
SDFG
when we freeze the stencil with origin/domaingt4py/src/gt4py/cartesian/backend/dace_backend.py
Lines 270 to 317 in ac253b6
This might be relevant when tracking down orchestration performance. Seems odd at least.
Requirements
Covered by existing tests
N/A