[compiler] Minor Requiredness Performance Enchancements #13991

ehigham · 2023-11-08T20:06:15Z

Main change: add var mark: Int to BaseIR.
On profiling the benchmark matrix_multi_write_nothing, I noticed a significant amount of time was spent

iterating through zipped arrays in requiredness
Adding and removing elements from HashSets.
In fact, half the time spent in requiredness was removing ir nodes from the HashSet set used as the queue! With this change, requiredness runs like a stabbed rat!

Explanation of mark:
This field acts as a flag that analyses can set. For example:

HasSharing can use the field to see if it has visited a node before.
Requiredness uses this field to tell if a node is currently enqueued.

The nextFlag method in IrMetadata allows for analyses to get a fresh value they can set the mark field.
This removes the need to traverse the IR after analyses to re-zero every mark field.

On profiling the benchmark `matrix_multi_write_nothing`, I noticed a significant amount of time was spent iterating through zipped arrays and removing nodes from the requiredness queue. In fact, half the time spent in requiredness was removing ir nodes from the hash set used as the queue! Now requiredness runs like a stabbed rat.

patrick-schultz

Interesting. I assume the issue is that Set (and presumably all of scala's hash table implementations) doesn't handle well workloads with intermixed inserts and deletes? Which you get around by putting everything in the seen hash table and never remove.

I think it's pretty common to not bother with the set, and just allow an item to be enqueued multiple times (for instance, this is what MLIR does). When a second occurrance is popped, it will (probably) result in no change to the analysis state, so no new items will be queued up. As long as visiting a node is fast, this might be faster than maintaining a set. I'd be curious to see how that performs here.

If keeping the set is faster, another optimization idea is to add all the nodes to the seen map during initialization, then call repack, since we won't add anything more.

patrick-schultz · 2023-11-09T12:55:28Z

hail/src/main/scala/is/hail/expr/ir/Requiredness.scala

+    private[this] val q =
+      mutable.Queue[RefEquality[BaseIR]]()
+    private[this] val seen =
+      mutable.AnyRefMap[RefEquality[BaseIR], Int]()


Probably no performance difference, but it would be clearer for this to map to Boolean instead of Int. And maybe rename this to inQueue? seen sounds like it includes things that are no longer in the queue.

Yeah, good suggestion.

patrick-schultz · 2023-11-09T13:02:09Z

hail/src/main/scala/is/hail/types/TypeWithRequiredness.scala

+    // foreach on zipped seqs is very slow as the implementation
+    // doesn't know that the seqs are the same length.


I feel like this comment is out of place here. We use both of these patterns all over the codebase, and we shouldn't comment on their relative performance every time. Perhaps we should start a doc of these kinds of scala performance gotchas that we can refer to, and reevaluate with future scala version changes.

Thanks for your feedback.

A zip and foreach is more like the code I want to write and indeed what I will write when not in a hotspot. A comment will prevent my future self getting upset at whomever wrote this low level ugly crap and changing it back.

I don't think a doc would be that practical. Knowing me, I'd likely forget about it rather than consult it whenever I need to write simple code like a for-loop.

…s-perf-improvements

patrick-schultz · 2023-11-14T20:11:38Z

hail/src/main/scala/is/hail/expr/ir/BaseIR.scala

+  def noSharing(ctx: ExecuteContext): this.type =
+    if (HasIRSharing(ctx)(this)) this.deepCopy() else this
+
+  var mark: Int = 0


Do I understand the use of this correctly?

For use as a boolean flag by IR passes. Each pass uses a different sentinel value to encode "true" (and anything else is false). As long as we maintain the global invariant that no two passes use the same sentinel value, this allows us to reuse this field across passes without ever having to initialize it at the start of a pass.

If this is accurate, could you add a comment here? The invariant is important: if anybody were to use this inconsistent with the above, it would break all other passes that use this field.

Exactly. Thanks for explaining it in a way I couldn't haha! I'll add the comment :)

…s-perf-improvements

ehigham assigned patrick-schultz Nov 8, 2023

patrick-schultz requested changes Nov 9, 2023

View reviewed changes

ehigham added 4 commits November 9, 2023 14:51

use mark flag to write if nodes have been visited

37985f3

thanks tim.

14e668b

Merge remote-tracking branch 'upstream/main' into ehigham/requirednes…

ccbd7f4

…s-perf-improvements

i dont understand

0b68065

ehigham requested a review from patrick-schultz November 13, 2023 17:16

patrick-schultz requested changes Nov 14, 2023

View reviewed changes

ehigham added 2 commits November 15, 2023 17:50

Merge remote-tracking branch 'upstream/main' into ehigham/requirednes…

03f13c8

…s-perf-improvements

add comment about sentinel values

330899b

ehigham requested a review from patrick-schultz November 15, 2023 22:53

patrick-schultz approved these changes Nov 16, 2023

View reviewed changes

danking merged commit eaf4197 into hail-is:main Nov 16, 2023

ehigham deleted the ehigham/requiredness-perf-improvements branch November 16, 2023 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[compiler] Minor Requiredness Performance Enchancements #13991

[compiler] Minor Requiredness Performance Enchancements #13991

ehigham commented Nov 8, 2023 •

edited

Loading

patrick-schultz left a comment •

edited

Loading

patrick-schultz Nov 9, 2023

ehigham Nov 9, 2023

patrick-schultz Nov 9, 2023

ehigham Nov 9, 2023

patrick-schultz Nov 14, 2023

ehigham Nov 14, 2023

		// foreach on zipped seqs is very slow as the implementation
		// doesn't know that the seqs are the same length.

[compiler] Minor Requiredness Performance Enchancements #13991

[compiler] Minor Requiredness Performance Enchancements #13991

Conversation

ehigham commented Nov 8, 2023 • edited Loading

patrick-schultz left a comment • edited Loading

Choose a reason for hiding this comment

patrick-schultz Nov 9, 2023

Choose a reason for hiding this comment

ehigham Nov 9, 2023

Choose a reason for hiding this comment

patrick-schultz Nov 9, 2023

Choose a reason for hiding this comment

ehigham Nov 9, 2023

Choose a reason for hiding this comment

patrick-schultz Nov 14, 2023

Choose a reason for hiding this comment

ehigham Nov 14, 2023

Choose a reason for hiding this comment

ehigham commented Nov 8, 2023 •

edited

Loading

patrick-schultz left a comment •

edited

Loading