Skip to content

Commit

Permalink
Implement array view elision (#24390)
Browse files Browse the repository at this point in the history
The main optimization this PR introduces is to avoid creating ArrayViews
in assignments like:

```chpl
Arr1[...] = Arr2[...];
```

See my
#24390 (comment)
and Jeremiah's
#24390 (comment)
for some quick performance results.

Resolves
#16133 (comment)

---


There are also other smaller optimizations in this PR. See below for
more details.

## 1. Array View Elision (copy/paste from a comment in compiler)

Array View Elision (AVE) aims to optimize assignments involving array
views.
Currently, this is limited to:

```
  slice = slice, and
  rank-change = rank-change
```

This is mostly out of abundance of caution and could be extended to
assignments involving arrays, too

The gist of the implementation is based on eliding array views from
operations such as the ones above. Note that this implies that the
following
cannot be covered:

```chpl
  ref slice = A[1..5];
  slice = A[6..10];
```

As determining whether `slice` can be dropped is more complicated than I
could bite at the moment. So, both sides of the assignments must be
array-view generating expressions for this optimization to fire.

There are two parts of this optimization:

### a. Pre-normalize (`ArrayViewElisionTransformer` is the type doing
this)

Given a statement like

```chpl
  A[x] = B[y];
```

we generate

```chpl
  param array_view_elision: bool;  // will be replaced during resolution
  if (array_view_elision) {
    var protoSlice1 = chpl__createProtoSlice(A, x);
    var protoSlice2 = chpl__createProtoSlice(B, y);

    __primitive(PRIM_PROTO_SLICE_ASSIGN, protoSlice1, protoSlice2);
  }
  else {
    A[x] = B[y];
  }
```

Here the `protoSlice*` has type `chpl__protoSlice`. See
`modules/internal/ChapelArrayViewElision.chpl` for the details of that
type.
The main purpose of that type is to represent the expression that would
create an array view. But avoid doing that.

### b. During prefold (ArrayViewElisionPrefolder is the type doing this)

Operation revolves around `PRIM_PROTO_SLICE_ASSIGN`. The
ArrayViewElisionPrefolder is in charge of finding the other relevant AST
(the
`CondStmt`, the protoSlice temps etc) and transforming the conditional.

Statically, `chpl__ave_exprCanBeProtoSlice` is called on both
protoSlices to
make sure that the module code is OK with creating protoSlices out of
those
expressions. We also check whether two protoSlices can be assigned to
one
another. This is done by `chpl__ave_protoSlicesSupportAssignment`. If
`fLocal`,
that's sufficient. Calls to that function are inserted, resolved, the
result
is collected, and finally the calls are removed. At that point, we drop
the
`array_view_elision` flag completely, and replace it with `true` or
`false`,
after which the conditional statement is constant-folded.

If not `fLocal`, we also call `chpl_bothLocal` an replace the flag with
the
result of that. Note that this is a dynamic check, meaning that the
conditional will not be removed.

This optimization is on-by-default. It can be controlled with
`--[no-]array-view-elision`. Additionally, there's also
`--report-array-view-elision` flag to enable some output during
compilation
to help with understanding what's optimized and what's not.

Took some more effort. So, full list of optimizations are:

## 2. Short array transfer optimization

For array transfers below some size threshold, we don't want memcpy- or
forall-based solutions. For such scenarios, this PR adds a code path to
directly fall back to `for`-based solutions. Currently, this code path
is exercised only for `chpl__protoSlice`s because of abundance of
caution.

## 3. Avoid creating domains in slice/transfer code

`chpl__protoSlice` stores a range or tuple of ranges. It is crucial to
avoid creating domains for bulk transfer. So, the module code for array
transfer has a lot of adjustments to work with ranges. In other words,
we should avoid creating domains (as well as array views) for

```chpl
  Arr1[1..5, 6..10] = Arr2[1..5, 6..10];
```

## 4. Improve `c_ptr` creation performance

`chpl__protoSlice` stores a `c_ptr` to the base array. So, we should be
able to create C pointers fast. To that end, this PR:
  - moves `locale` checking in `boundsChecking`, and
- makes type checking for DR arrays in `c_ptrTo` et al completely static


## TODO:
- [x] add a flag to control the optimization
- [x] get buy-in for `c_ptr` optimizations
- [x] update/improve the PR message
- [x] SAT shouldn't fire for GPU to GPU transfers, probably. Maybe
disable it completely?

## Test status
- [x] linux64
- [x] gasnet
- [x] gpu (nvidia)

## Future work
- Check out the status on
#16133 (comment)
- Also on:
#24343 (comment)

[Reviewed by @benharsh (lead) and @riftEmber (C pointer changes)]
  • Loading branch information
e-kayrakli authored Jul 8, 2024
2 parents 9857064 + b70c751 commit 14f563d
Show file tree
Hide file tree
Showing 86 changed files with 3,709 additions and 122 deletions.
2 changes: 2 additions & 0 deletions compiler/AST/primitive.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1070,6 +1070,8 @@ initPrimitive() {
prim_def(PRIM_MAYBE_LOCAL_ARR_ELEM, "may be local array element", returnInfoUnknown);
prim_def(PRIM_MAYBE_AGGREGATE_ASSIGN, "may be aggregated assignment", returnInfoUnknown);

prim_def(PRIM_PROTO_SLICE_ASSIGN, "assign proto slices", returnInfoVoid);

prim_def(PRIM_ERROR, "error", returnInfoVoid, true);
prim_def(PRIM_WARNING, "warning", returnInfoVoid, true);
prim_def(PRIM_WHEN, "when case expressions", returnInfoVoid);
Expand Down
88 changes: 88 additions & 0 deletions compiler/include/arrayViewElision.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
/*
* Copyright 2020-2024 Hewlett Packard Enterprise Development LP
* Copyright 2004-2019 Cray Inc.
* Other additional copyright holders may be indicated within.
*
* The entirety of this work is licensed under the Apache License,
* Version 2.0 (the "License"); you may not use this file except
* in compliance with the License.
*
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include "CallExpr.h"
#include "stmt.h"
#include "symbol.h"

// interface for normalize
void arrayViewElision();

class ArrayViewElisionTransformer {
public:
ArrayViewElisionTransformer() = delete;
ArrayViewElisionTransformer(CallExpr* origCall);

inline bool candidate() const { return candidate_; }

void transform();

private:
CallExpr* origCall_;
CallExpr* origLhs_;
CallExpr* origRhs_;

bool candidate_;

bool exprSuitableForProtoSlice(CallExpr* e, bool isLhs);
CallExpr* genCreateProtoSlice(CallExpr* call);
};

// interface for prefold
class ArrayViewElisionPrefolder {
public:
ArrayViewElisionPrefolder() = delete;
ArrayViewElisionPrefolder(CallExpr* call);
~ArrayViewElisionPrefolder();

inline CondStmt* condStmt() const { return condStmt_; }
inline bool supported() const { return supported_; }
inline BlockStmt* staticCheckBlock() const { return staticCheckBlock_; }

CallExpr* getReplacement();
void report();
void updateAndFoldConditional();

private:
CallExpr* call_;
Symbol* lhsBase_;
Symbol* rhsBase_;
CallExpr* newProtoSliceLhs_;
CallExpr* newProtoSliceRhs_;
CondStmt* condStmt_;
Symbol* tmpCondFlag_;
bool supported_;
BlockStmt* staticCheckBlock_;

// support for report-array-view-elision
std::string lhsBaseType_;
std::string rhsBaseType_;

std::vector<std::string> lhsIndexExprTypes_;
std::vector<std::string> rhsIndexExprTypes_;

void findCondStmt();
void findProtoSlices();
bool handleOneProtoSlice(bool isLhs);
bool canAssign() const;
CallExpr* findOneProtoSliceCall(Expr* e);
Symbol* getFlagReplacement();
};

3 changes: 3 additions & 0 deletions compiler/include/driver.h
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,9 @@ extern bool fReportAutoLocalAccess;
extern bool fAutoAggregation;
extern bool fReportAutoAggregation;

extern bool fArrayViewElision;
extern bool fReportArrayViewElision;

extern bool fNoRemoteValueForwarding;
extern bool fNoInferConstRefs;
extern bool fNoRemoteSerialization;
Expand Down
6 changes: 6 additions & 0 deletions compiler/main/driver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,9 @@ bool fReportAutoLocalAccess= false;
bool fAutoAggregation = false;
bool fReportAutoAggregation= false;

bool fArrayViewElision = true;
bool fReportArrayViewElision = false;

bool printPasses = false;
FILE* printPassesFile = NULL;

Expand Down Expand Up @@ -1274,6 +1277,8 @@ static ArgumentDescription arg_desc[] = {

{"auto-aggregation", ' ', NULL, "Enable [disable] automatically aggregating remote accesses in foralls", "N", &fAutoAggregation, "CHPL_AUTO_AGGREGATION", NULL},

{"array-view-elision", ' ', NULL, "Enable [disable] array view elision", "N", &fArrayViewElision, "CHPL_DISABLE_ARRAY_VIEW_ELISION", NULL},

{"", ' ', NULL, "Run-time Semantic Check Options", NULL, NULL, NULL, NULL},
{"checks", ' ', NULL, "Enable [disable] all following run-time checks", "n", &fNoChecks, "CHPL_CHECKS", setChecks},
{"bounds-checks", ' ', NULL, "Enable [disable] bounds checking", "n", &fNoBoundsChecks, "CHPL_BOUNDS_CHECKING", NULL},
Expand Down Expand Up @@ -1403,6 +1408,7 @@ static ArgumentDescription arg_desc[] = {
{"report-optimized-on", ' ', NULL, "Print information about on clauses that have been optimized for potential fast remote fork operation", "F", &fReportOptimizedOn, NULL, NULL},
{"report-auto-local-access", ' ', NULL, "Enable compiler logs for auto local access optimization", "N", &fReportAutoLocalAccess, "CHPL_REPORT_AUTO_LOCAL_ACCESS", NULL},
{"report-auto-aggregation", ' ', NULL, "Enable compiler logs for automatic aggregation", "N", &fReportAutoAggregation, "CHPL_REPORT_AUTO_AGGREGATION", NULL},
{"report-array-view-elision", ' ', NULL, "Enable compiler logs for array view elision", "N", &fReportArrayViewElision, "CHPL_REPORT_ARRAY_VIEW_ELISION", NULL},
{"report-optimized-forall-unordered-ops", ' ', NULL, "Show which statements in foralls have been converted to unordered operations", "F", &fReportOptimizeForallUnordered, NULL, NULL},
{"report-promotion", ' ', NULL, "Print information about scalar promotion", "F", &fReportPromotion, NULL, NULL},
{"report-scalar-replace", ' ', NULL, "Print scalar replacement stats", "F", &fReportScalarReplace, NULL, NULL},
Expand Down
1 change: 1 addition & 0 deletions compiler/optimizations/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
set(SRCS
arrayViewElision.cpp
bulkCopyRecords.cpp
copyPropagation.cpp
deadCodeElimination.cpp
Expand Down
1 change: 1 addition & 0 deletions compiler/optimizations/Makefile.share
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
# limitations under the License.

OPTIMIZATIONS_SRCS = \
arrayViewElision.cpp \
bulkCopyRecords.cpp \
copyPropagation.cpp \
deadCodeElimination.cpp \
Expand Down
Loading

0 comments on commit 14f563d

Please sign in to comment.