Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler] Minor Requiredness Performance Enchancements #13991

Merged
merged 7 commits into from
Nov 16, 2023

Conversation

ehigham
Copy link
Member

@ehigham ehigham commented Nov 8, 2023

Main change: add var mark: Int to BaseIR.
On profiling the benchmark matrix_multi_write_nothing, I noticed a significant amount of time was spent

  • iterating through zipped arrays in requiredness
  • Adding and removing elements from HashSets.
    In fact, half the time spent in requiredness was removing ir nodes from the HashSet set used as the queue! With this change, requiredness runs like a stabbed rat!

Explanation of mark:
This field acts as a flag that analyses can set. For example:

  • HasSharing can use the field to see if it has visited a node before.
  • Requiredness uses this field to tell if a node is currently enqueued.

The nextFlag method in IrMetadata allows for analyses to get a fresh value they can set the mark field.
This removes the need to traverse the IR after analyses to re-zero every mark field.

On profiling the benchmark `matrix_multi_write_nothing`, I noticed a
significant amount of time was spent iterating through zipped arrays and
removing nodes from the requiredness queue. In fact, half the time spent
in requiredness was removing ir nodes from the hash set used as the
queue! Now requiredness runs like a stabbed rat.
Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I assume the issue is that Set (and presumably all of scala's hash table implementations) doesn't handle well workloads with intermixed inserts and deletes? Which you get around by putting everything in the seen hash table and never remove.

I think it's pretty common to not bother with the set, and just allow an item to be enqueued multiple times (for instance, this is what MLIR does). When a second occurrance is popped, it will (probably) result in no change to the analysis state, so no new items will be queued up. As long as visiting a node is fast, this might be faster than maintaining a set. I'd be curious to see how that performs here.

If keeping the set is faster, another optimization idea is to add all the nodes to the seen map during initialization, then call repack, since we won't add anything more.

private[this] val q =
mutable.Queue[RefEquality[BaseIR]]()
private[this] val seen =
mutable.AnyRefMap[RefEquality[BaseIR], Int]()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably no performance difference, but it would be clearer for this to map to Boolean instead of Int. And maybe rename this to inQueue? seen sounds like it includes things that are no longer in the queue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good suggestion.

Comment on lines +102 to +103
// foreach on zipped seqs is very slow as the implementation
// doesn't know that the seqs are the same length.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this comment is out of place here. We use both of these patterns all over the codebase, and we shouldn't comment on their relative performance every time. Perhaps we should start a doc of these kinds of scala performance gotchas that we can refer to, and reevaluate with future scala version changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback.

A zip and foreach is more like the code I want to write and indeed what I will write when not in a hotspot. A comment will prevent my future self getting upset at whomever wrote this low level ugly crap and changing it back.

I don't think a doc would be that practical. Knowing me, I'd likely forget about it rather than consult it whenever I need to write simple code like a for-loop.

def noSharing(ctx: ExecuteContext): this.type =
if (HasIRSharing(ctx)(this)) this.deepCopy() else this

var mark: Int = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand the use of this correctly?

For use as a boolean flag by IR passes. Each pass uses a different sentinel value to encode "true" (and anything else is false). As long as we maintain the global invariant that no two passes use the same sentinel value, this allows us to reuse this field across passes without ever having to initialize it at the start of a pass.

If this is accurate, could you add a comment here? The invariant is important: if anybody were to use this inconsistent with the above, it would break all other passes that use this field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Thanks for explaining it in a way I couldn't haha! I'll add the comment :)

@danking danking merged commit eaf4197 into hail-is:main Nov 16, 2023
@ehigham ehigham deleted the ehigham/requiredness-perf-improvements branch November 16, 2023 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants