refactor[cartesian]: unexpanded sdfg cleanups #1843

romanc · 2025-02-01T06:35:31Z

Description

Refactors from recent debugging sessions around transient arrays in the "unexpanded SDFG" (the one with the library nodes):

Remove unused **kwargs from OirSDFGBuilder
Forward debugging information about transient arrays to DaCe
Use a (constant) variable for connector prefixes of data going in to/out of the library nodes
Configure the lifetime of transient arrays directly in OirSDFGBuilder
Configure storage type of transient arrays directly in OirSDFGBuilder (for GPU targets)
Configure the LibraryNode's device type directly in OirSDFGBuilder (for GPU targets)
Enum type for StorageDevice in LayoutInfo

Sidenote on the allocation lifetime: In the orchestrated code path, we reset the allocation lifetime of transients to SDFG when we freeze the stencil with origin/domain

gt4py/src/gt4py/cartesian/backend/dace_backend.py

Lines 270 to 317 in ac253b6

    
           def freeze_origin_domain_sdfg(inner_sdfg, arg_names, field_info, *, origin, domain): 
        
               wrapper_sdfg = dace.SDFG("frozen_" + inner_sdfg.name) 
        
               state = wrapper_sdfg.add_state("frozen_" + inner_sdfg.name + "_state") 
        
               inputs = set() 
        
               outputs = set() 
        
               for inner_state in inner_sdfg.nodes(): 
        
                   for node in inner_state.nodes(): 
        
                       if ( 
        
                           not isinstance(node, dace.nodes.AccessNode) 
        
                           or inner_sdfg.arrays[node.data].transient 
        
                       ): 
        
                           continue 
        
                       if node.has_reads(inner_state): 
        
                           inputs.add(node.data) 
        
                       if node.has_writes(inner_state): 
        
                           outputs.add(node.data) 
        
               nsdfg = state.add_nested_sdfg(inner_sdfg, None, inputs, outputs) 
        
               _sdfg_add_arrays_and_edges( 
        
                   field_info, wrapper_sdfg, state, inner_sdfg, nsdfg, inputs, outputs, origins=origin 
        
               ) 
        
               # in special case of empty domain, remove entire SDFG. 
        
               if any(d == 0 for d in domain): 
        
                   states = wrapper_sdfg.states() 
        
                   assert len(states) == 1 
        
                   for node in states[0].nodes(): 
        
                       state.remove_node(node) 
        
               # make sure that symbols are passed through to inner sdfg 
        
               for symbol in nsdfg.sdfg.free_symbols: 
        
                   if symbol not in wrapper_sdfg.symbols: 
        
                       wrapper_sdfg.add_symbol(symbol, nsdfg.sdfg.symbols[symbol]) 
        
               # Try to inline wrapped SDFG before symbols are specialized to avoid extra views 
        
               inline_sdfgs(wrapper_sdfg) 
        
               _sdfg_specialize_symbols(wrapper_sdfg, domain) 
        
               for _, _, array in wrapper_sdfg.arrays_recursive(): 
        
                   if array.transient: 
        
                       array.lifetime = dace.dtypes.AllocationLifetime.SDFG 
        
               wrapper_sdfg.arg_names = arg_names 
        
               return wrapper_sdfg

This might be relevant when tracking down orchestration performance. Seems odd at least.

Requirements

All fixes and/or new features come with corresponding tests.
Covered by existing tests
Important design decisions have been documented in the appropriate ADR inside the docs/development/ADRs/ folder.
N/A

src/gt4py/cartesian/gtc/dace/oir_to_dace.py

romanc · 2025-02-10T08:10:30Z

(rebased onto main branch because there was a hickup on the CSCS-CI)

FlorianDeconinck · 2025-02-10T21:48:31Z

Re: freeze_origin_domain_sdfg

My cursory look seems to point that this is only called at top level for DaceLazyStencil et al. e.g. it's not called when we are doing orchestration but for the top level arguments (and then it's fine).

We should log a task to double check and we need a methodological/code way to differentiate code path we expect is in stencil mode and what is in orchestrated mode

FlorianDeconinck

Looking good. Couple of question.

Do the enum for the device type, the string is making me sad.

src/gt4py/cartesian/gtc/dace/oir_to_dace.py

src/gt4py/cartesian/gtc/dace/expansion/sdfg_builder.py

FlorianDeconinck

LGTM

FlorianDeconinck · 2025-02-11T16:05:08Z

@havogt Care to review or ok to push?

havogt

I would prefer to use the DeviceType enum from _core.definitions instead of introducing a new enum. Can you check if that works?

havogt · 2025-02-12T12:15:15Z

src/gt4py/storage/cartesian/layout.py

+class StorageDevice(Enum):
+    CPU = 1
+    GPU = 2


It would be nice to use the DeviceType from _core.definitions. I don't have time to check how ROCM would fit here.
Maybe we can do

CurrentGPU = core_defs.DeviceType.ROCM if cp.cuda.runtime.is_hip else core_defs.DeviceType.CUDA

somewhere. cupy currently has a limitation that it is either installed for cuda or for rocm, so it would be fine.

Sure, let's reuse DeviceType from the core definitions. I honestly don't know what our support level for AMD graphic cards is. From what I read in the code, it looks like that's gonna be an item on the tasklist at some point. So I personally wouldn't worry too much about ROCM for now

ECMWF runs AMD hardware - so we need to keep this alive

For me to understand, which backend is compatible with AMD hardware? From what I read, the CudaBackend as well as the DaceGPUBackend, generate cuda code.

So DaCe can HIP - I believe that's where they are getting there AMD goodies. @stubbiali, can you talk to the AMD hardware use? Am I misremembering ?

Both GridTools and DaCe GPU backends support AMD graphics cards. We have been running GT4Py codes on LUMI for at least 1.5 years.

Forward debug info from gt4py to dace if we know it. If we don't know, just don't specify instead of setting `DebugInfo(0)`. Moved `get_dace_debuginfo()` one folder higher from expansion utils into "general" dace utils because its not only used in expansion.

Transients are added in OirSDFGBuilder, where no array lifetime is configured. After building that SDFG, the lifetime of all transients is manually set to `Persistent` (which is an optimization leading to less frequent memory allocation in case a kernel is called multiple times). In this commit we directly specify the transient's lifetime when building the SDFG.

For GPU targets, we have to configure the `storage_type` for transient arrays. In addition, we have to set the library node's `device` property. We can do both while building the SDFG instead of separate passes afterwards.

romanc · 2025-02-12T17:07:49Z

Putting this back to draft mode.

Re-using DeviceType from _core.definitions is possible and I think it's worth doing.
It's a bit of work and I might need to re-shuffle some things because tach complains that gt4py.cartesian should not depend on gt4py._core directly. I guess everything related to cpu/gpu device(s) should go through the storage module. In the end, there are many checks like "if this is a gpu device, then do this ..." and they should be centralized (in the gt_storage package or wherever) such that we can deal with CUDA / ROCM in a single place.
Mixing the original changes in this PR with changing the "cpu" / "gpu" strings re-use DeviceType is going to look ugly in the git history (especially when squashed).
I'll push a WIP state here and then probably separate the two things into separate (consecutive) PRs tomorrow.

To be split from the other changes, probably in a follow-up PR.

romanc · 2025-02-14T19:44:51Z

Closing as superseded by #1860 and #1867.

## Description Refactors from (recent) debugging sessions around transient arrays in the "unexpanded SDFG" (the one with the library nodes): - Remove unused `**kwargs` from `OirSDFGBuilder` - Forward debugging information about transient arrays to DaCe - Use a (constant) variable for connector prefixes of data going into/out of the library nodes - Configure the lifetime of transient arrays directly in `OirSDFGBuilder` This is a follow-up from PR #1843. In this PR, we separate the DaCe backend cleanup from the refactor around (re-)using `DeviceType` instead of `"cpu" | "gpu"` string literals. ## Requirements - [x] All fixes and/or new features come with corresponding tests. Should be covered by existing tests. - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. N/A --------- Co-authored-by: Roman Cattaneo <> Co-authored-by: Roman Cattaneo <[email protected]> Co-authored-by: Florian Deconinck <[email protected]>

romanc force-pushed the romanc/unexpanded-sdfg-cleanups branch 2 times, most recently from 76c47ad to c7aba86 Compare February 3, 2025 12:52

romanc marked this pull request as ready for review February 3, 2025 18:18

romanc requested review from FlorianDeconinck and twicki February 3, 2025 18:18

romanc commented Feb 3, 2025

View reviewed changes

src/gt4py/cartesian/gtc/dace/oir_to_dace.py Outdated Show resolved Hide resolved

romanc force-pushed the romanc/unexpanded-sdfg-cleanups branch from 553196d to 38df291 Compare February 10, 2025 08:09

FlorianDeconinck requested changes Feb 10, 2025

View reviewed changes

src/gt4py/cartesian/gtc/dace/oir_to_dace.py Outdated Show resolved Hide resolved

src/gt4py/cartesian/gtc/dace/expansion/sdfg_builder.py Show resolved Hide resolved

romanc force-pushed the romanc/unexpanded-sdfg-cleanups branch from 38df291 to 6ea0fe4 Compare February 11, 2025 09:46

FlorianDeconinck approved these changes Feb 11, 2025

View reviewed changes

havogt reviewed Feb 12, 2025

View reviewed changes

Roman Cattaneo and others added 6 commits February 12, 2025 15:45

Remove unused kwargs from OirSDFGBuilder

f23ed89

Use variable for connector prefixes

e12b7cb

Move device storage into OirSDFGBuilder

4ee0bd2

For GPU targets, we have to configure the `storage_type` for transient arrays. In addition, we have to set the library node's `device` property. We can do both while building the SDFG instead of separate passes afterwards.

Enum for StorageDevice in LayoutInfo

ed59a6a

romanc marked this pull request as draft February 12, 2025 16:59

WIP: Re-use DeviceType from core_definitions

4ad9458

To be split from the other changes, probably in a follow-up PR.

romanc force-pushed the romanc/unexpanded-sdfg-cleanups branch from 6ea0fe4 to 4ad9458 Compare February 12, 2025 17:28

This was referenced Feb 13, 2025

refactor[cartesian]: unexpanded sdfg cleanups #1860

Merged

refactor[cartesian]: use DeviceType from core_defn in LayoutInfo #1867

Draft

romanc closed this Feb 14, 2025

romanc deleted the romanc/unexpanded-sdfg-cleanups branch February 14, 2025 19:44

romanc mentioned this pull request Feb 20, 2025

refactor: storage, layout, and devices (mostly cartesian) #1880

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor[cartesian]: unexpanded sdfg cleanups #1843

refactor[cartesian]: unexpanded sdfg cleanups #1843

romanc commented Feb 1, 2025 •

edited

Loading

romanc commented Feb 10, 2025

FlorianDeconinck commented Feb 10, 2025

FlorianDeconinck left a comment

FlorianDeconinck left a comment

FlorianDeconinck commented Feb 11, 2025

havogt left a comment

havogt Feb 12, 2025

romanc Feb 12, 2025

FlorianDeconinck Feb 12, 2025

romanc Feb 12, 2025

FlorianDeconinck Feb 12, 2025

stubbiali Feb 12, 2025

romanc commented Feb 12, 2025

romanc commented Feb 14, 2025

	def freeze_origin_domain_sdfg(inner_sdfg, arg_names, field_info, *, origin, domain):
	wrapper_sdfg = dace.SDFG("frozen_" + inner_sdfg.name)
	state = wrapper_sdfg.add_state("frozen_" + inner_sdfg.name + "_state")

	inputs = set()
	outputs = set()
	for inner_state in inner_sdfg.nodes():
	for node in inner_state.nodes():
	if (
	not isinstance(node, dace.nodes.AccessNode)
	or inner_sdfg.arrays[node.data].transient
	):
	continue
	if node.has_reads(inner_state):
	inputs.add(node.data)
	if node.has_writes(inner_state):
	outputs.add(node.data)

	nsdfg = state.add_nested_sdfg(inner_sdfg, None, inputs, outputs)

	_sdfg_add_arrays_and_edges(
	field_info, wrapper_sdfg, state, inner_sdfg, nsdfg, inputs, outputs, origins=origin
	)

	# in special case of empty domain, remove entire SDFG.
	if any(d == 0 for d in domain):
	states = wrapper_sdfg.states()
	assert len(states) == 1
	for node in states[0].nodes():
	state.remove_node(node)

	# make sure that symbols are passed through to inner sdfg
	for symbol in nsdfg.sdfg.free_symbols:
	if symbol not in wrapper_sdfg.symbols:
	wrapper_sdfg.add_symbol(symbol, nsdfg.sdfg.symbols[symbol])

	# Try to inline wrapped SDFG before symbols are specialized to avoid extra views
	inline_sdfgs(wrapper_sdfg)

	_sdfg_specialize_symbols(wrapper_sdfg, domain)

	for _, _, array in wrapper_sdfg.arrays_recursive():
	if array.transient:
	array.lifetime = dace.dtypes.AllocationLifetime.SDFG

	wrapper_sdfg.arg_names = arg_names

	return wrapper_sdfg

refactor[cartesian]: unexpanded sdfg cleanups #1843

refactor[cartesian]: unexpanded sdfg cleanups #1843

Conversation

romanc commented Feb 1, 2025 • edited Loading

Description

Requirements

romanc commented Feb 10, 2025

FlorianDeconinck commented Feb 10, 2025

FlorianDeconinck left a comment

Choose a reason for hiding this comment

FlorianDeconinck left a comment

Choose a reason for hiding this comment

FlorianDeconinck commented Feb 11, 2025

havogt left a comment

Choose a reason for hiding this comment

havogt Feb 12, 2025

Choose a reason for hiding this comment

romanc Feb 12, 2025

Choose a reason for hiding this comment

FlorianDeconinck Feb 12, 2025

Choose a reason for hiding this comment

romanc Feb 12, 2025

Choose a reason for hiding this comment

FlorianDeconinck Feb 12, 2025

Choose a reason for hiding this comment

stubbiali Feb 12, 2025

Choose a reason for hiding this comment

romanc commented Feb 12, 2025

romanc commented Feb 14, 2025

romanc commented Feb 1, 2025 •

edited

Loading