Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Iterator contexts prototype (chapel-lang#24488)
This PR is the initial step towards implementing features primarily designed and discussed in: - chapel-lang#16405 - Cray/chapel-private#4349 - Cray/chapel-private#5216 - and the aspirations of chapel-lang#9529 and chapel-lang#6334 It enables nightly GPU testing on the newly-added `test/users/engin/context`. ### ADDED FEATURES -- these are not user-facing at all There are two umbrella features introduced here: - querying things like number of tasks and task IDs from inside the body of a `forall` loop - hoisting variable declarations from the loop body into upper contexts that typically represent `coforall`s and `foreach`es in the iterator implementation. This allows, for example, locale-private variables to be declared inside the loop body. These features are enabled when compiling with `--iterator-contexts`, which is off by default. The flag `--report-context-adjustments` enables debugging printouts for compiler developers. First, the iterators to be used in parallel loops are augmented with "handles." These are instances of the `Context` type defined in test/users/engin/context/ChapelContextSupport.chpl For example: var onCtx = new Context(rank=rank, taskId=locDomIdx, numTasks=locDoms.domain.shape); The `taskId` indicates the position of the current task within the rank-dimensional space of the shape `numTasks`. The innermost foreach loop serves as a handle automatically. Second, to use the above in a forall loop, the loop body is augmented with something like this: const context = new Context(); const vectorContext = __primitive("outer context", ctx1, context); const localTaskContext = __primitive("outer context", ctx1, vectorContext); const localeContext = __primitive("outer context", ctx2, localTaskContext); // where type ctx1 = Context(1, int(64)); // 1-d space of tasks type ctx2 = Context(2, (int(64), int(64))); // 2-d space of tasks These variables will be mapped to the iterator(s)' handles, starting with the (dynamically) innermost handle. This mapping enables querying of the current task's position, e.g., `localeContext.taskId`. Currently it is the user responsibility to match the type of the variable and the type of the corresponding handle. Finally, to hoist a variable in the loop body to the corresponding context, declare it using split-init as follows: var localTile; { const ref locSubDom = Dom.localSubdomain(); localTile = Input[{locSubDom.dim(1), locSubDom.dim(0)}]; } __primitive("hoist to context", localeContext, localTile); The contents of the block will be hoisted together with the variable being declared, here `localTile`, to the context associated with the variable in the primitive, here `localeContext`. Array, c_array, and barrier variables are currently supported. See the code in `test/users/engin/context` for examples. ### IMPLEMENTATION OVERVIEW The core of the implementation is in a new file called `lowerLoopContexts` and fires at the end of iterator lowering. New primitives PRIM_INNERMOST_CONTEXT, PRIM_OUTER_CONTEXT, PRIM_HOIST_TO_CONTEXT, and a special `Context` type is at the core of the implementation. User's can use `Context` variables to "find" outer contexts, and in turn hoist variable declarations in such contexts. Currently arrays, barriers and c_arrays can be moved around in this manner. The current design requires iterators to be mildly modified to make use of these features. This PR uses its own iterators that are pretty much copied directly from DR and Block distribution except for the few added lines in support of this PR. `test/users/engin/context` has those iterators, module support that will eventually turn into an internal module and finally some tests that demonstrate how these features can be used. `.good` files in test/users/engin/context adjust to the tests behaving differently for locale model = flat vs. gpu. There, our start_test framework chooses: * `testname.comm-none.lm-gpu.good` for comm==none and lm==gpu (obviously) * `testname.comm-none.good` for comm==none and lm!=gpu * `testname.good` for comm!=none, whether lm==gpu or not The tests behave differently for comm==none and lm==flat because only in this configuration the compiler removes on-statements early and the implementation has not been adjusted to handle this properly. This is a todo item. While there: * Tidy up CHPL_NIGHTLY_TEST_DIRS in GPU-related scripts in util/cron. * Remove the non-portable sed option `-i` from test/gpu/native/noGpu/basicMem.prediff. ### NEXT STEPS We would like to implement the user-facing syntax proposed in Cray/chapel-private#5216 to facilitate writing more codes using these features and help with the final user-facing design. Implementation-wise, the immediate next steps are: * Enable `test/users/engin/context/transpose.chpl` for GPUs. * Implement detection of handles properly when comm==none and lm==flat, i.e., when `on`-statements are removed from the AST early and so multiple handles can end up in a single block. * Is `_ddata_allocate_noinit_gpu_shared()` newly-added to ChapelBase.chpl needed? * Resolve the compiler crash with --verify and lm==gpu observable in `test/gpu/native/distArray/blockUseInFunction.chpl`, a few tests under release/examples, etc. * Improve the prototype syntax; perhaps switch to block-based syntax. * Revisit how barriers should be hoisted w.r.t. automatic adjustment of the number of tasks. Some steps for productization: * Implement hoisting as part of lowerForallStmtsInline(). * Add the creation of `Context` handles into our standard iterators, including DefaultRectangular, BlockDist, etc. * ... and ensure they are removed when unused. The branch has been developed by @e-kayrakli, @DanilaFe and @vasslitvinov. Earlier dev history: 98f9c70..eccaf0d and b09bbfc..5250416. Reviewed by: @e-kayrakli. Merged by: @vasslitvinov.
- Loading branch information