Update from upstream repo facebookresearch/TensorComprehensions@master #11

backstroke-bot · 2018-07-05T22:46:51Z

Hello!

The upstream repository facebookresearch/TensorComprehensions@master has some new changes that aren't in this fork. So, here they are, ready to be merged! 🎉

If this pull request can be merged without conflict, you can publish your software with these new changes. Otherwise, fix any merge conflicts by clicking the Resolve Conflicts button.

If you like Backstroke, consider donating to help us pay for infrastructure here. Backstroke is a completely open source project that's free to use, but we survive on sponsorships and donations. Thanks for your support! Help out Backstroke.

Created by Backstroke (I'm a bot!)

tightenLaunchBounds logs a warning if the minimum of the block or thread mapping is strictly positive since it would be possible to shift the mapping over that minimum and further tighten the bounds. Given the current infrastructure of TC, it is fairly unlikely that this would happen and especially for this to happen consistently over the entire tree. Furthermore, by the time tightenLaunchBounds is called it is too late to change the mapping. Move the warning to where the mapping is constructed and where it could in principle still be adjusted.

This check is similar to one performed by tightenLaunchBounds, which will be removed when tightenLaunchBounds gets implemented in terms of the actual mapping instead of the filter derived from the mapping.

This will be needed for extracting the mapping schedule in the next commit.

The mapping filters are derived from the mapping schedule. It is more natural, simpler and less error-prone to derive the tightened launch bounds directly from the mapping schedule.

Caffe2 API killed the copy constructor so we need to update `GetNamedTensor`. So we use the Caffe2 API to return a new tensor that shares the data with the tensor from the workspace. Tested internally with the latest API.

Without this, the following error occurs: `TypeError: 'float' object cannot be interpreted as an integer`

tightenLaunchBounds: use mapping schedule instead of mapping filters

Tested internally

Update to support upcoming Caffe2 API

Caffe2 benchmarks are now handled internally until they can check on the external CI. This will most likely happen when pytorch 1.0 binaries are available.

This commit switches from Tapir which is based on LLVM 5.0 to trunk LLVM. The main motivation is that LLVM 5.0 NVPTX only supports ancient cuda architectures and that FB internally uses an LLVM that is much closer to trunk.

Move to trunk LLVM

Drop caffe2 benchmark from OSS

Some prehistoric representation of TC schedule trees used to needlessly keep track of an isl::ast_loop_type field. The printing function was left in by mistake when this field was removed.

bump isl for replacing isl_space_{un}named_set_from_params

schedule_print.cc: drop dead code

A partial schedule produced by a call to partialSchedule already has its domain intersected with the parent filter nodes. There is no need to separately compute the intersection of parent filters using activeDomainPoints to include it in the schedule. Drop the unnecessary intersection and remove the variable that became unused.

This function promotes to shared memory below the given node (scope). For now, it is only expected to work when called with block-mapped node as scope. It will be extended to work on any node above thread-mapping in upcoming commits.

The check whether the promotion to shared memory improves coalescing is performed by looking at the schedule dimension that is mapped to CUDA thread x. The existing implementation relies on a so called "full schedule" that contains all schedule dimensions. In practice, the partial schedule until the dimension mapped to thread x is sufficient. Compute thie partial schedule inside of promotionImprovesCoalescing instead of precopmuting the "full schedule" externally.

The last use of this function was removed in the previous commit. The function itself is dangerous because of its misleading name: it ignores sequence and set nodes making the caller believe statement instances separated by the sequence/set are scheduled to the same logical date. It was never intended to be used outside the memory promotion heuristic. Drop it.

Currenly, promotion to shared memory is only performed below the loops mapped to blocks. Thus, tensor reference groups implictly account for blocks. The scope of mapping in the tree will be changed in upcoming commits. Explicitly include block mapping into the partial schedule within which the tensor reference groups are computed.

The function no longer requires the node to be a band.

Schedule tree invariants prevent us from inserting an extension node, necessary to copy data during promotion, as a child of a sequence or a set node.

Shared memory is accessible from all threads. If promotion is requested below the thread mapping, the scoped tensor reference groups are specific to threads, guaranteeing no reuse between different threads. It does not make sense to use shared memory in such cases. Furthermore, if one attempted to use shared memory below thread mapping, e.g., to hide global memory latency, one would have to account for the pre-existing thread mapping when emitting shared<->global memory copies.

Arguably, it was a mistake to have separate functions in the first place. This led to situations in tests where the copies between global and shared memory were not mapped to threads. Merge promoteToSharedGreedy and promoteGreedilyAtDepth into a single function, promoteToSharedAtDepth. The name is chosen for consistency with promoteToRegistersAtDepth.

This function will be reused in an upcoming commit.

This will ensure that it also applies to templated isl types derived from isl::multi_aff and isl::multi_val. Make sure the template type T has a scale_down method to avoid the template operator getting triggered on other types.

There is no need to create an extra isl::aff object here.

This will ensure that it also applies to templated isl types derived from isl::aff. Make sure the template type T has an add_constant method to avoid the template operator getting triggered on other types.

This will ensure that it also applies to templated isl types derived from isl::aff. Make sure the template type T has a scale method to avoid the template operator getting triggered on other types.

The space of an isl_multi_union_pw_aff is the common range space, i.e., a set space. The current version of isl tolerates calling domain() on a set space, but it is technically incorrect. Call params() instead. Moving to templated isl types will make this issue more evident.

…s() call In theory, the isl::union_map::empty factory method expects a parameter space. In practice, isl acceps any kind of space (from which it extracts a parameter space). For switching to templated isl types, it is better to use a consistent type of space.

This will make it easier to switch to templated isl types.

…ables This will make it easier to switch to templated isl types.

This will make it easier to switch to templated types.

This is the only instance of a call to to_str(), so there is no immediate need to add it to mainline. Simply use the output operator, which will get added to mainline isl.

The schedule was used in prehistory to compare against a schedule computed in another way. Since the comparison has been removed, there is no point in constructing the schedule.

In prehistory, user pointers were being stored in isl_id objects and then it was in some cases necessary to remove them. Since no such user pointers are being used in the current code base there is no point in trying to remove them.

Update documentation to explain why `pytorch=0.4.0` is now mandatory.

drop dead code

launchBounds: avoid use of to_str() isl methods

prepare for templated isl types

The C++ bindings from mainline isl do not allow an object to be explicitly assigned a NULL value.

Update installation.rst

bump isl for merge of C++ bindings

Fix a broken link in README

The filter_ field was being explicitly initialized to its default value, which is confusing, especially since the body of the constructor performs a proper initialization of this field.

ScheduleTreeMapping::ScheduleTreeMapping: drop redundant initialization

Add file headers for OSS requirement

Sven Verdoolaege and others added 30 commits July 4, 2018 18:19

bump isl for isl::{multi_,}union_pw_aff::{min,max}_{multi_,}val

6ff195e

MappedScop::map: add extra sanity check on mappings

daac7ae

This check is similar to one performed by tightenLaunchBounds, which will be removed when tightenLaunchBounds gets implemented in terms of the actual mapping instead of the filter derived from the mapping.

tightenLaunchBounds: pass in MappedScop

701ab91

This will be needed for extracting the mapping schedule in the next commit.

tightenLaunchBounds: use mapping schedule instead of mapping filters

bb697dc

The mapping filters are derived from the mapping schedule. It is more natural, simpler and less error-prone to derive the tightened launch bounds directly from the mapping schedule.

Update to support upcoming Caffe2 API

c805194

Caffe2 API killed the copy constructor so we need to update `GetNamedTensor`. So we use the Caffe2 API to return a new tensor that shares the data with the tensor from the workspace. Tested internally with the latest API.

Use proper int division operator

995a808

Without this, the following error occurs: `TypeError: 'float' object cannot be interpreted as an integer`

Merge pull request #550 from facebookresearch/pr/tighten

f306bee

tightenLaunchBounds: use mapping schedule instead of mapping filters

Update test_caffe2 to latest python bindings

553203c

Tested internally

Update caffe2_benchmak.py to latest python API

379a450

Tested internally

Merge pull request #545 from nicolasvasilache/pr/caffe2-update

b77b583

Update to support upcoming Caffe2 API

Drop caffe2 benchmark from OSS

3516e62

Caffe2 benchmarks are now handled internally until they can check on the external CI. This will most likely happen when pytorch 1.0 binaries are available.

Move to trunk LLVM

7dbb65b

This commit switches from Tapir which is based on LLVM 5.0 to trunk LLVM. The main motivation is that LLVM 5.0 NVPTX only supports ancient cuda architectures and that FB internally uses an LLVM that is much closer to trunk.

Merge pull request #565 from nicolasvasilache/pr/llvm-trunk

2243364

Move to trunk LLVM

Merge pull request #559 from nicolasvasilache/pr/fbcode-update

3dfab71

Drop caffe2 benchmark from OSS

bump isl for replacing isl_space_{un}named_set_from_params

20d3d4f

schedule_print.cc: drop dead code

1618c89

Some prehistoric representation of TC schedule trees used to needlessly keep track of an isl::ast_loop_type field. The printing function was left in by mistake when this field was removed.

Merge pull request #561 from facebookresearch/pr/space

f045fc1

bump isl for replacing isl_space_{un}named_set_from_params

Merge pull request #567 from facebookresearch/pr/dead

d02a9e9

schedule_print.cc: drop dead code

promoteToSharedGreedy: extract promoteToSharedBelow

5f384ce

This function promotes to shared memory below the given node (scope). For now, it is only expected to work when called with block-mapped node as scope. It will be extended to work on any node above thread-mapping in upcoming commits.

promoteToSharedBelow: rename argument from bandNode to node

d9aef27

The function no longer requires the node to be a band.

promoteToSharedBelow: disallow promotion below sequence/set

56a0343

Schedule tree invariants prevent us from inserting an extension node, necessary to copy data during promotion, as a child of a sequence or a set node.

promoteToSharedGreedy: drop unused argument

fe8f519

promoteToSharedBelow: extract out isInThreadMappedScope

b0b73cb

This function will be reused in an upcoming commit.

Sven Verdoolaege and others added 30 commits August 2, 2018 10:32

generalize operator/(S left, T right) to template form

4cd6b79

This will ensure that it also applies to templated isl types derived from isl::multi_aff and isl::multi_val. Make sure the template type T has a scale_down method to avoid the template operator getting triggered on other types.

bump isl for export of isl_aff_add_constant_val

940516d

operator+(isl::aff A, isl::val v): call isl::aff::add_constant

dd5952a

There is no need to create an extra isl::aff object here.

generalize operator+(T A, isl::val v) to template form

8775082

This will ensure that it also applies to templated isl types derived from isl::aff. Make sure the template type T has an add_constant method to avoid the template operator getting triggered on other types.

generalize operator*(T A, isl::val v) to template form

7789b79

This will ensure that it also applies to templated isl types derived from isl::aff. Make sure the template type T has a scale method to avoid the template operator getting triggered on other types.

MappedScop::insertMappingContext: drop redundant universe() call

3be2511

Scop::makeScop: automatically deduce type of space variable

5900b06

This will make it easier to switch to templated isl types.

halide2isl.cc: extractAccess: automatically deduce type of space vari…

340abcf

…ables This will make it easier to switch to templated isl types.

promotionImprovesCoalescing: use get_map_list

3a8e622

This will make it easier to switch to templated types.

addSingletonReferenceGroups: use get_map_list

5651b42

This will make it easier to switch to templated types.

launchBounds: avoid use of to_str() isl methods

0036cd2

This is the only instance of a call to to_str(), so there is no immediate need to add it to mainline. Simply use the output operator, which will get added to mainline isl.

test_core: drop construction of schedule

7c172e5

The schedule was used in prehistory to compare against a schedule computed in another way. Since the comparison has been removed, there is no point in constructing the schedule.

Update installation.rst

6865ad8

Update documentation to explain why `pytorch=0.4.0` is now mandatory.

Merge pull request #596 from facebookresearch/pr/dead

fbaf4cd

drop dead code

Merge pull request #595 from facebookresearch/pr/clean-up

41e5334

launchBounds: avoid use of to_str() isl methods

Merge pull request #594 from facebookresearch/pr/pre-template

08902aa

prepare for templated isl types

Scop::makeScop: only copy dependences if they have been computed

6668481

The C++ bindings from mainline isl do not allow an object to be explicitly assigned a NULL value.

bump isl for merge of C++ bindings

5281e66

Merge pull request #598 from facebookresearch/nicolasvasilache-patch-1-1

0979b42

Update installation.rst

Merge pull request #599 from facebookresearch/pr/merge_master

38e3032

bump isl for merge of C++ bindings

Fix a broken link in README

c752590

Merge pull request #600 from tosaka2/patch-1

00caa7d

Fix a broken link in README

ScheduleTreeMapping::ScheduleTreeMapping: drop redundant initialization

744d35e

The filter_ field was being explicitly initialized to its default value, which is confusing, especially since the body of the constructor performs a proper initialization of this field.

Merge pull request #601 from facebookresearch/pr/pre-template

220b590

ScheduleTreeMapping::ScheduleTreeMapping: drop redundant initialization

Add file headers for OSS requirement

10be3f3

Merge pull request #625 from facebookresearch/add-headers

fd01443

Add file headers for OSS requirement

Remove references to unmaintained website

680f8c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update from upstream repo facebookresearch/TensorComprehensions@master #11

Update from upstream repo facebookresearch/TensorComprehensions@master #11

backstroke-bot commented Jul 5, 2018

Update from upstream repo facebookresearch/TensorComprehensions@master #11

Are you sure you want to change the base?

Update from upstream repo facebookresearch/TensorComprehensions@master #11

Conversation

backstroke-bot commented Jul 5, 2018