Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from upstream repo facebookresearch/TensorComprehensions@master #11

Open
wants to merge 190 commits into
base: master
Choose a base branch
from

Conversation

backstroke-bot
Copy link

Hello!

The upstream repository facebookresearch/TensorComprehensions@master has some new changes that aren't in this fork. So, here they are, ready to be merged! 🎉

If this pull request can be merged without conflict, you can publish your software with these new changes. Otherwise, fix any merge conflicts by clicking the Resolve Conflicts button.


If you like Backstroke, consider donating to help us pay for infrastructure here. Backstroke is a completely open source project that's free to use, but we survive on sponsorships and donations. Thanks for your support! Help out Backstroke.


Created by Backstroke (I'm a bot!)

Sven Verdoolaege and others added 30 commits July 4, 2018 18:19
tightenLaunchBounds logs a warning if the minimum of
the block or thread mapping is strictly positive
since it would be possible to shift the mapping over that
minimum and further tighten the bounds.
Given the current infrastructure of TC, it is fairly
unlikely that this would happen and especially for this to happen
consistently over the entire tree.
Furthermore, by the time tightenLaunchBounds is called
it is too late to change the mapping.
Move the warning to where the mapping is constructed and
where it could in principle still be adjusted.
This check is similar to one performed by tightenLaunchBounds,
which will be removed when tightenLaunchBounds gets implemented
in terms of the actual mapping instead of the filter derived
from the mapping.
This will be needed for extracting the mapping schedule
in the next commit.
The mapping filters are derived from the mapping schedule.
It is more natural, simpler and less error-prone
to derive the tightened launch bounds directly from
the mapping schedule.
Caffe2 API killed the copy constructor so we need to update
`GetNamedTensor`. So we use the Caffe2 API to return a new
tensor that shares the data with the tensor from the workspace.

Tested internally with the latest API.
Without this, the following error occurs:
`TypeError: 'float' object cannot be interpreted as an integer`
tightenLaunchBounds: use mapping schedule instead of mapping filters
Update to support upcoming Caffe2 API
Caffe2 benchmarks are now handled internally until they can check
on the external CI. This will most likely happen when pytorch 1.0
binaries are available.
This commit switches from Tapir which is based on LLVM 5.0 to trunk LLVM.
The main motivation is that LLVM 5.0 NVPTX only supports ancient cuda
architectures and that FB internally uses an LLVM that is
much closer to trunk.
Some prehistoric representation of TC schedule trees
used to needlessly keep track of an isl::ast_loop_type field.
The printing function was left in by mistake when this field was removed.
bump isl for replacing isl_space_{un}named_set_from_params
schedule_print.cc: drop dead code
A partial schedule produced by a call to partialSchedule already has its
domain intersected with the parent filter nodes.  There is no need to
separately compute the intersection of parent filters using
activeDomainPoints to include it in the schedule.  Drop the unnecessary
intersection and remove the variable that became unused.
This function promotes to shared memory below the given node (scope).
For now, it is only expected to work when called with block-mapped node
as scope.  It will be extended to work on any node above thread-mapping
in upcoming commits.
The check whether the promotion to shared memory improves coalescing is
performed by looking at the schedule dimension that is mapped to CUDA
thread x.  The existing implementation relies on a so called "full
schedule" that contains all schedule dimensions.  In practice, the
partial schedule until the dimension mapped to thread x is sufficient.
Compute thie partial schedule inside of promotionImprovesCoalescing
instead of precopmuting the "full schedule" externally.
The last use of this function was removed in the previous commit.  The
function itself is dangerous because of its misleading name: it ignores
sequence and set nodes making the caller believe statement instances
separated by the sequence/set are scheduled to the same logical date.
It was never intended to be used outside the memory promotion heuristic.
Drop it.
Currenly, promotion to shared memory is only performed below the loops
mapped to blocks.  Thus, tensor reference groups implictly account for
blocks.  The scope of mapping in the tree will be changed in upcoming
commits.  Explicitly include block mapping into the partial schedule
within which the tensor reference groups are computed.
The function no longer requires the node to be a band.
Schedule tree invariants prevent us from inserting an extension node,
necessary to copy data during promotion, as a child of a sequence or a
set node.
Shared memory is accessible from all threads.  If promotion is requested
below the thread mapping, the scoped tensor reference groups are
specific to threads, guaranteeing no reuse between different threads.
It does not make sense to use shared memory in such cases.  Furthermore,
if one attempted to use shared memory below thread mapping, e.g., to
hide global memory latency, one would have to account for the
pre-existing thread mapping when emitting shared<->global memory copies.
Arguably, it was a mistake to have separate functions in the first
place.  This led to situations in tests where the copies between global
and shared memory were not mapped to threads.  Merge
promoteToSharedGreedy and promoteGreedilyAtDepth into a single function,
promoteToSharedAtDepth.  The name is chosen for consistency with
promoteToRegistersAtDepth.
This function will be reused in an upcoming commit.
Sven Verdoolaege and others added 30 commits August 2, 2018 10:32
This will ensure that it also applies to templated isl types derived
from isl::multi_aff and isl::multi_val.
Make sure the template type T has a scale_down method to avoid
the template operator getting triggered on other types.
There is no need to create an extra isl::aff object here.
This will ensure that it also applies to templated isl types derived
from isl::aff.
Make sure the template type T has an add_constant method to avoid
the template operator getting triggered on other types.
This will ensure that it also applies to templated isl types derived
from isl::aff.
Make sure the template type T has a scale method to avoid
the template operator getting triggered on other types.
The space of an isl_multi_union_pw_aff is the common range space,
i.e., a set space.  The current version of isl tolerates calling
domain() on a set space, but it is technically incorrect.
Call params() instead.
Moving to templated isl types will make this issue more evident.
…s() call

In theory, the isl::union_map::empty factory method expects
a parameter space.  In practice, isl acceps any kind of space
(from which it extracts a parameter space).
For switching to templated isl types, it is better
to use a consistent type of space.
This will make it easier to switch to templated isl types.
…ables

This will make it easier to switch to templated isl types.
This will make it easier to switch to templated types.
This will make it easier to switch to templated types.
This is the only instance of a call to to_str(), so there is
no immediate need to add it to mainline.
Simply use the output operator, which will get added to mainline isl.
The schedule was used in prehistory to compare against a schedule
computed in another way.  Since the comparison has been removed,
there is no point in constructing the schedule.
In prehistory, user pointers were being stored in isl_id objects and
then it was in some cases necessary to remove them.
Since no such user pointers are being used in the current code base
there is no point in trying to remove them.
Update documentation to explain why `pytorch=0.4.0` is now mandatory.
launchBounds: avoid use of to_str() isl methods
The C++ bindings from mainline isl do not allow
an object to be explicitly assigned a NULL value.
Fix a broken link in README
The filter_ field was being explicitly initialized to its default value,
which is confusing, especially since the body of the constructor
performs a proper initialization of this field.
ScheduleTreeMapping::ScheduleTreeMapping: drop redundant initialization
Add file headers for OSS requirement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants