-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[compiler] Minor Requiredness Performance Enchancements #13991
[compiler] Minor Requiredness Performance Enchancements #13991
Conversation
On profiling the benchmark `matrix_multi_write_nothing`, I noticed a significant amount of time was spent iterating through zipped arrays and removing nodes from the requiredness queue. In fact, half the time spent in requiredness was removing ir nodes from the hash set used as the queue! Now requiredness runs like a stabbed rat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I assume the issue is that Set (and presumably all of scala's hash table implementations) doesn't handle well workloads with intermixed inserts and deletes? Which you get around by putting everything in the seen
hash table and never remove.
I think it's pretty common to not bother with the set, and just allow an item to be enqueued multiple times (for instance, this is what MLIR does). When a second occurrance is popped, it will (probably) result in no change to the analysis state, so no new items will be queued up. As long as visiting a node is fast, this might be faster than maintaining a set. I'd be curious to see how that performs here.
If keeping the set is faster, another optimization idea is to add all the nodes to the seen
map during initialization, then call repack
, since we won't add anything more.
private[this] val q = | ||
mutable.Queue[RefEquality[BaseIR]]() | ||
private[this] val seen = | ||
mutable.AnyRefMap[RefEquality[BaseIR], Int]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably no performance difference, but it would be clearer for this to map to Boolean instead of Int. And maybe rename this to inQueue
? seen
sounds like it includes things that are no longer in the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good suggestion.
// foreach on zipped seqs is very slow as the implementation | ||
// doesn't know that the seqs are the same length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this comment is out of place here. We use both of these patterns all over the codebase, and we shouldn't comment on their relative performance every time. Perhaps we should start a doc of these kinds of scala performance gotchas that we can refer to, and reevaluate with future scala version changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback.
A zip
and foreach
is more like the code I want to write and indeed what I will write when not in a hotspot. A comment will prevent my future self getting upset at whomever wrote this low level ugly crap and changing it back.
I don't think a doc would be that practical. Knowing me, I'd likely forget about it rather than consult it whenever I need to write simple code like a for-loop.
def noSharing(ctx: ExecuteContext): this.type = | ||
if (HasIRSharing(ctx)(this)) this.deepCopy() else this | ||
|
||
var mark: Int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand the use of this correctly?
For use as a boolean flag by IR passes. Each pass uses a different sentinel value to encode "true" (and anything else is false). As long as we maintain the global invariant that no two passes use the same sentinel value, this allows us to reuse this field across passes without ever having to initialize it at the start of a pass.
If this is accurate, could you add a comment here? The invariant is important: if anybody were to use this inconsistent with the above, it would break all other passes that use this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. Thanks for explaining it in a way I couldn't haha! I'll add the comment :)
Main change: add
var mark: Int
toBaseIR
.On profiling the benchmark
matrix_multi_write_nothing
, I noticed a significant amount of time was spentHashSet
s.In fact, half the time spent in requiredness was removing ir nodes from the
HashSet
set used as the queue! With this change, requiredness runs like a stabbed rat!Explanation of
mark
:This field acts as a flag that analyses can set. For example:
HasSharing
can use the field to see if it has visited a node before.Requiredness
uses this field to tell if a node is currently enqueued.The
nextFlag
method inIrMetadata
allows for analyses to get a fresh value they can set themark
field.This removes the need to traverse the IR after analyses to re-zero every
mark
field.