forked from facebookresearch/TensorComprehensions
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update from upstream repo facebookresearch/TensorComprehensions@master #11
Open
backstroke-bot
wants to merge
190
commits into
UofT-EcoSystem:master
Choose a base branch
from
facebookresearch:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tightenLaunchBounds logs a warning if the minimum of the block or thread mapping is strictly positive since it would be possible to shift the mapping over that minimum and further tighten the bounds. Given the current infrastructure of TC, it is fairly unlikely that this would happen and especially for this to happen consistently over the entire tree. Furthermore, by the time tightenLaunchBounds is called it is too late to change the mapping. Move the warning to where the mapping is constructed and where it could in principle still be adjusted.
This check is similar to one performed by tightenLaunchBounds, which will be removed when tightenLaunchBounds gets implemented in terms of the actual mapping instead of the filter derived from the mapping.
This will be needed for extracting the mapping schedule in the next commit.
The mapping filters are derived from the mapping schedule. It is more natural, simpler and less error-prone to derive the tightened launch bounds directly from the mapping schedule.
Caffe2 API killed the copy constructor so we need to update `GetNamedTensor`. So we use the Caffe2 API to return a new tensor that shares the data with the tensor from the workspace. Tested internally with the latest API.
Without this, the following error occurs: `TypeError: 'float' object cannot be interpreted as an integer`
tightenLaunchBounds: use mapping schedule instead of mapping filters
Tested internally
Tested internally
Update to support upcoming Caffe2 API
Caffe2 benchmarks are now handled internally until they can check on the external CI. This will most likely happen when pytorch 1.0 binaries are available.
This commit switches from Tapir which is based on LLVM 5.0 to trunk LLVM. The main motivation is that LLVM 5.0 NVPTX only supports ancient cuda architectures and that FB internally uses an LLVM that is much closer to trunk.
Move to trunk LLVM
Drop caffe2 benchmark from OSS
Some prehistoric representation of TC schedule trees used to needlessly keep track of an isl::ast_loop_type field. The printing function was left in by mistake when this field was removed.
bump isl for replacing isl_space_{un}named_set_from_params
schedule_print.cc: drop dead code
A partial schedule produced by a call to partialSchedule already has its domain intersected with the parent filter nodes. There is no need to separately compute the intersection of parent filters using activeDomainPoints to include it in the schedule. Drop the unnecessary intersection and remove the variable that became unused.
This function promotes to shared memory below the given node (scope). For now, it is only expected to work when called with block-mapped node as scope. It will be extended to work on any node above thread-mapping in upcoming commits.
The check whether the promotion to shared memory improves coalescing is performed by looking at the schedule dimension that is mapped to CUDA thread x. The existing implementation relies on a so called "full schedule" that contains all schedule dimensions. In practice, the partial schedule until the dimension mapped to thread x is sufficient. Compute thie partial schedule inside of promotionImprovesCoalescing instead of precopmuting the "full schedule" externally.
The last use of this function was removed in the previous commit. The function itself is dangerous because of its misleading name: it ignores sequence and set nodes making the caller believe statement instances separated by the sequence/set are scheduled to the same logical date. It was never intended to be used outside the memory promotion heuristic. Drop it.
Currenly, promotion to shared memory is only performed below the loops mapped to blocks. Thus, tensor reference groups implictly account for blocks. The scope of mapping in the tree will be changed in upcoming commits. Explicitly include block mapping into the partial schedule within which the tensor reference groups are computed.
The function no longer requires the node to be a band.
Schedule tree invariants prevent us from inserting an extension node, necessary to copy data during promotion, as a child of a sequence or a set node.
Shared memory is accessible from all threads. If promotion is requested below the thread mapping, the scoped tensor reference groups are specific to threads, guaranteeing no reuse between different threads. It does not make sense to use shared memory in such cases. Furthermore, if one attempted to use shared memory below thread mapping, e.g., to hide global memory latency, one would have to account for the pre-existing thread mapping when emitting shared<->global memory copies.
Arguably, it was a mistake to have separate functions in the first place. This led to situations in tests where the copies between global and shared memory were not mapped to threads. Merge promoteToSharedGreedy and promoteGreedilyAtDepth into a single function, promoteToSharedAtDepth. The name is chosen for consistency with promoteToRegistersAtDepth.
This function will be reused in an upcoming commit.
This will ensure that it also applies to templated isl types derived from isl::multi_aff and isl::multi_val. Make sure the template type T has a scale_down method to avoid the template operator getting triggered on other types.
There is no need to create an extra isl::aff object here.
This will ensure that it also applies to templated isl types derived from isl::aff. Make sure the template type T has an add_constant method to avoid the template operator getting triggered on other types.
This will ensure that it also applies to templated isl types derived from isl::aff. Make sure the template type T has a scale method to avoid the template operator getting triggered on other types.
The space of an isl_multi_union_pw_aff is the common range space, i.e., a set space. The current version of isl tolerates calling domain() on a set space, but it is technically incorrect. Call params() instead. Moving to templated isl types will make this issue more evident.
…s() call In theory, the isl::union_map::empty factory method expects a parameter space. In practice, isl acceps any kind of space (from which it extracts a parameter space). For switching to templated isl types, it is better to use a consistent type of space.
This will make it easier to switch to templated isl types.
…ables This will make it easier to switch to templated isl types.
This will make it easier to switch to templated types.
This will make it easier to switch to templated types.
This is the only instance of a call to to_str(), so there is no immediate need to add it to mainline. Simply use the output operator, which will get added to mainline isl.
The schedule was used in prehistory to compare against a schedule computed in another way. Since the comparison has been removed, there is no point in constructing the schedule.
In prehistory, user pointers were being stored in isl_id objects and then it was in some cases necessary to remove them. Since no such user pointers are being used in the current code base there is no point in trying to remove them.
Update documentation to explain why `pytorch=0.4.0` is now mandatory.
drop dead code
launchBounds: avoid use of to_str() isl methods
prepare for templated isl types
The C++ bindings from mainline isl do not allow an object to be explicitly assigned a NULL value.
Update installation.rst
bump isl for merge of C++ bindings
Fix a broken link in README
The filter_ field was being explicitly initialized to its default value, which is confusing, especially since the body of the constructor performs a proper initialization of this field.
ScheduleTreeMapping::ScheduleTreeMapping: drop redundant initialization
Add file headers for OSS requirement
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello!
The upstream repository
facebookresearch/TensorComprehensions@master
has some new changes that aren't in this fork. So, here they are, ready to be merged! 🎉If this pull request can be merged without conflict, you can publish your software with these new changes. Otherwise, fix any merge conflicts by clicking the
Resolve Conflicts
button.If you like Backstroke, consider donating to help us pay for infrastructure here. Backstroke is a completely open source project that's free to use, but we survive on sponsorships and donations. Thanks for your support! Help out Backstroke.
Created by Backstroke (I'm a bot!)