Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serokell: [Milestone-2] Optimize PersistenOrderedMap.mo #664

Closed
wants to merge 20 commits into from

Conversation

GoPavel
Copy link

@GoPavel GoPavel commented Oct 14, 2024

This is an MR for the 2nd Milestone of Serokell's grant work aimed to improve Motoko's base library.

Within the milestone, we made performance experiments over the new module PersistentOrderedMap.mo and tried to optimize it. This PR contains the optimization that exhibited good:

  • iter: move matching on the direction out: 7-15% speed up, benchmark results
  • foldLeft/foldRight: use direct recursion: ~80% speed up, benchmark results
  • mapFilter: use foldLeft instead of iter: 20-40% speed up, benchmark results
  • inline node color into the constructor: ~8% less max memory, ~8% speed up on all functions except get and folds which become ~7% slower, benchmark results
  • optimization of pattern matching order: ~3-15% speed up for basic functions, benchmark results

Also, it makes Internal module public since static calls perform better, benchmark results

UPDATE:
After some discussions, we settled on the following additional changes:

  • Optimization: Remove the tuple from the tree definition: 20-25% less memory (max and live heap), speed +-3% depends on the operation, benchmark results
  • Move all operations into MapOps for user convenience. As a side effect, operations that were not in the MapOps got slower on ~50 instructions per call (~2500 per 50-batch) on the benchmarks.
  • Make the Internal module private again

See benchmark results of the whole update.

This MR is following up #654

Final performance comparison

Map comparison

Initial results | |binary_size|generate|max mem|batch_get 50|batch_put 50|batch_remove 50|upgrade| |--:|--:|--:|--:|--:|--:|--:|--:| |persistentmap_100|187_090|201_602|42_600|51_044|122_234|124_817|440_282| |persistentmap_baseline_100|191_689|226_832|45_672|49_945|139_070|134_191|512_457| |rbtree_100|189_877|225_155|42_540|50_045|135_367|133_686|565_657| |persistentmap_1000|187_090|2_724_937|568_248|68_227|160_005|168_031|4_153_206| |persistentmap_baseline_1000|191_689|3_133_922|612_880|67_416|184_117|181_732|4_878_488| |rbtree_1000|189_877|3_118_490|580_948|67_516|181_375|180_396|5_409_805| |persistentmap_10000|187_090|45_412_473|480_360|84_528|195_152|214_853|41_210_098| |persistentmap_baseline_10000|191_689|51_438_500|520_360|83_365|227_294|231_116|48_561_107| |rbtree_10000|189_877|51_301_049|520_428|83_465|224_135|230_257|53_866_589| |persistentmap_100000|187_090|531_616_890|4_800_360|98_864|233_003|258_058|542_245_665| |persistentmap_baseline_100000|191_689|608_157_242|5_200_360|97_912|273_597|277_661|645_876_276| |rbtree_100000|189_877|606_914_881|5_200_428|98_012|270_465|276_802|698_928_540| |persistentmap_1000000|187_090|6_080_971_407|48_000_396|117_446|271_877|307_984|5_422_489_439| |persistentmap_baseline_1000000|191_689|7_005_190_168|52_000_396|116_317|320_359|331_196|6_458_379_331| |rbtree_1000000|189_877|6_993_676_959|52_000_464|116_417|317_299|330_296|6_988_883_198|
binary_size generate max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
persistentmap_100 192_080 204_126 0 51_387 124_116 128_293 388_084
persistentmap_baseline_100 191_689 226_832 45_672 49_945 139_070 134_191 512_457
rbtree_100 189_877 225_155 42_540 50_045 135_367 133_686 565_657
persistentmap_1000 192_080 2_775_816 0 68_754 162_897 173_291 3_634_959
persistentmap_baseline_1000 191_689 3_133_922 612_880 67_416 184_117 181_732 4_878_488
rbtree_1000 189_877 3_118_490 580_948 67_516 181_375 180_396 5_409_805
persistentmap_10000 192_080 42_949_375 360_372 85_219 199_435 221_201 36_038_892
persistentmap_baseline_10000 191_689 51_438_500 520_360 83_365 227_294 231_116 48_561_107
rbtree_10000 189_877 51_301_049 520_428 83_465 224_135 230_257 53_866_589
persistentmap_100000 192_080 509_484_647 3_600_372 99_728 238_691 265_873 458_331_215
persistentmap_baseline_100000 191_689 608_157_242 5_200_360 97_912 273_597 277_661 645_876_276
rbtree_100000 189_877 606_914_881 5_200_428 98_012 270_465 276_802 698_928_540
persistentmap_1000000 192_080 5_883_267_991 36_000_372 118_499 278_670 317_512 4_583_348_765
persistentmap_baseline_1000000 191_689 7_005_190_168 52_000_396 116_317 320_359 331_196 6_458_379_331
rbtree_1000000 189_877 6_993_676_959 52_000_464 116_417 317_299 330_296 6_988_883_198

Persistent map API

Initial results | |size|foldLeft|foldRight|mapfilter|map| |--:|--:|--:|--:|--:|--:| |persistentmap|100|19_787|20_719|89_105|29_538| |persistentmap_baseline|100|92_138|93_745|169_663|29_048| |persistentmap|1000|167_597|176_129|1_549_566|263_679| |persistentmap_baseline|1000|888_169|899_681|3_556_717|257_766| |persistentmap|10000|1_648_003|1_729_751|32_416_330|2_600_540| |persistentmap_baseline|10000|19_529_035|19_640_763|43_314_564|2_544_359| |persistentmap|100000|16_454_053|17_259_673|384_771_808|129_765_701| |persistentmap_baseline|100000|195_212_923|196_318_318|505_815_170|132_210_500| |persistentmap|1000000|164_559_185|172_575_493|4_435_968_960|1_297_599_225| |persistentmap_baseline|1000000|1_952_082_879|1_963_098_142|5_763_869_513|1_322_035_748|
size foldLeft foldRight mapfilter map
persistentmap 100 19_623 19_535 89_915 26_278
persistentmap_baseline 100 92_138 93_745 169_663 29_048
persistentmap 1000 165_633 164_145 1_577_927 230_324
persistentmap_baseline 1000 888_169 899_681 3_556_717 257_766
persistentmap 10000 1_628_080 1_609_767 29_675_517 2_270_416
persistentmap_baseline 10000 19_529_035 19_640_763 43_314_564 2_544_359
persistentmap 100000 16_254_171 16_059_689 359_232_494 94_265_059
persistentmap_baseline 100000 195_212_923 196_318_318 505_815_170 132_210_500
persistentmap 1000000 162_559_221 876_570_432 4_198_876_914 942_589_235
persistentmap_baseline 1000000 1_952_082_879 1_963_098_142 5_763_869_513 1_322_035_748

@GoPavel
Copy link
Author

GoPavel commented Oct 14, 2024

Some results of our experiments we believe show where the Motoko compiler has the potential to perform better optimizations:

  1. More inlining of private local functions: In the experiment we tried to avoid repetitive branches for different colors of nodes via moving common code into a scope-local function but got bad performance. Probably compiler could inline such functions.
  2. We found that the last optimization (inlining colors) makes get slower which is surprising for us. The only difference in the get implementation is that instead of 2 constructors we have 3 (with duplicating branche bodies). I would expect the same performance, so probably this case requires more investigation and finally will give a clue as to how to improve compiler optimization.

@GoPavel GoPavel changed the title Draft: Serokell: [Milestone-2] Optimize PersistenOrderedMap.mo Serokell: [Milestone-2] Optimize PersistenOrderedMap.mo Oct 14, 2024
src/PersistentOrderedMap.mo Outdated Show resolved Hide resolved
src/PersistentOrderedMap.mo Outdated Show resolved Hide resolved
s-and-witch and others added 10 commits October 21, 2024 17:31
No changes in logic so far, just simple refactoring
Add `MapOps` class with the following signature:

  public class MapOps<K>(compare : (K,K) -> O.Order) {

    public func put<V>(rbMap : Map<K, V>, key : K, value : V) : Map<K, V>

    public func fromIter<V>(i : I.Iter<(K,V)>) : Map<K, V>

    public func replace<V>(rbMap : Map<K, V>, key : K, value : V) : (Map<K,V>, ?V)

    public func mapFilter<V1, V2>(f : (K, V1) -> ?V2, rbMap : Map<K, V1>) : Map<K, V2>

    public func get<V>(key : K, rbMap : Map<K, V>) : ?V

    public func delete<V>(rbMap : Map<K, V>, key : K) : Map<K, V>

    public func remove< V>(rbMap : Map<K, V>, key : K) : (Map<K,V>, ?V)

  };

The other functionality provided as standalone functions, as they
don't require comparator:

  public type Direction = { #fwd; #bwd };

  public func iter<K, V>(rbMap : Map<K, V>, direction : Direction) : I.Iter<(K, V)>

  public func entries<K, V>(m : Map<K, V>) : I.Iter<(K, V)>

  public func keys<K, V>(m : Map<K, V>, direction : Direction) : I.Iter<K>

  public func vals<K, V>(m : Map<K, V>, direction : Direction) : I.Iter<V>

  public func map<K, V1, V2>(f : (K, V1) -> V2, rbMap : Map<K, V1>) : Map<K, V2>

  public func size<K, V>(t : Map<K, V>) : Nat

  public func foldLeft<Key, Value, Accum>(
    combine : (Key, Value, Accum) -> Accum,
    base : Accum,
    rbMap : Map<Key, Value>
  ) : Accum

  And foldRight with the same signature as foldLeft

The following functions are new for the API:
- MapOps.put, MapOps.delete
- MapOps.fromIter, entries, keys, vals
- MapOps.mapFilter, map
- foldLeft, foldRight
Problem: now order is not consistent within new module and with old
modules as well.

Solution: make the map argument always go first
In addition to tests this patch removes `direction`
argument from `keys` and `values` function to keep
them simple and provides a new function `Map.empty`
to create a map without knowing its internal representation.
* rename `rbMap` into `m` in signature for brevity & consistent language
* rename `rbMap` into `map` in examples for brevity & encapsulation sake
* rename `tree` into `map` in doc comments for the encapsulation sake
Copy link
Contributor

@crusso crusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very nice. Added some comments inline.

Compared with https://ocaml.org/manual/5.2/api/Map.S.html, do you think we need to add any more operations now or should we leave them to later? There are some basic ones like OCaml.mem (Motoko has) to check a key exists, and OCaml for_all/exists (Motoko all/exists) for computing predicates (presumeably short-circuiting).

src/PersistentOrderedMap.mo Outdated Show resolved Hide resolved
Comment on lines 30 to 36
// TODO: a faster, more compact and less indirect representation would be:
// type Map<K, V> = {
// #red : (Map<K, V>, K, V, Map<K, V>);
// #black : (Map<K, V>, K, V, Map<K, V>);
// #leaf
//};
// (this inlines the colors into the variant, flattens a tuple, and removes a (now) redundant option, for considerable heap savings.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can delete the TODO now (all addressed right?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

///
/// `MapOps` contains methods that require `compare` internally:
/// operations that may reshape a `Map` or should find something.
public class MapOps<K>(compare : (K,K) -> O.Order) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public class MapOps<K>(compare : (K,K) -> O.Order) {
public class MapOps<K>(compare : (K, K) -> O.Order) {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// #leaf
//};
// (this inlines the colors into the variant, flattens a tuple, and removes a (now) redundant option, for considerable heap savings.)
// It would also make sense to maintain the size in a separate root for 0(1) access.
Copy link
Contributor

@crusso crusso Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// It would also make sense to maintain the size in a separate root for 0(1) access.
// It would also make sense to maintain the size in a separate root for 0(1) access.

What do you think about reconsidering this now? I think @luc was quite keen on adding this and I think it would be useful for, e.g. coping a tree to an array etc. Does it complicate the operations much?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will make it. We hope it will not diminish performance much, but just in case we will check the benchmark again.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to make it in the 3rd Milestone (#662)

/// Cost of empty map creation
/// Runtime: `O(1)`.
/// Space: `O(1)`
public func empty<V>() : Map<K, V> = #leaf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made we wonder is we should rename #leaf to #empty, so users can just use #empty for the empty map, and also to distinguish this from the old RBTree #leaf constructor. Whaddya think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have it for the encapsulation. For example, when we add the size field the empty method will be necessary.

/// import Debug "mo:base/Debug";
///
/// let mapOps = Map.MapOps<Nat>(Nat.compare);
/// let rbMap = mapOps.fromIter<Text>(Iter.fromArray([(0, "Zero"), (2, "Two"), (1, "One")]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you decide not to rename rbMap to map throughout, or is that for a separate PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we just did not rebase this branch yet because I was considering that there are probably some open discussions in the #654

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased

/// where `n` denotes the number of key-value entries stored in the map.
///
/// Note: Full map iteration creates `O(n)` temporary objects that will be collected as garbage.
public func entries<V>(m : Map<K, V>) : I.Iter<(K, V)> = iter(m, #fwd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RBTree.mo had methods rb.entries, rb.entriesRev and static function RBTree.iter(rb, dir). Should we just consolidate on either entries() and entriesRev() or iter(rb, dir)?

Copy link
Contributor

@crusso crusso Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: If we keep iter(rb, dir), can we move type Direction into MapOps?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just consolidate on either entries() and entriesRev() or iter(rb, dir)?

We don't have a strong opinion about this. We can do either way or just add entriesRev().

We was reading design.md, which says that entries should be, but it's clarifying precisely this case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crusso Should we raise this question in the Slack?

(#red (l, x, y, r))
};
case _ {
Debug.trap "RBTree.red"
Copy link
Contributor

@crusso crusso Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Debug.trap "RBTree.red"
Debug.trap "PersistentOrderedMap.red"

Are there others that need fixing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, I haven't found more "RBTree" stuff in the code.

@GoPavel
Copy link
Author

GoPavel commented Oct 28, 2024

@crusso I've rebased this PR upon #654. Please check open discussion threads so we can report that the 2nd Milestone is finished.
Could you please approve CI here as well?

crusso added a commit that referenced this pull request Nov 13, 2024
This is an MR for the 3rd Milestone of the Serokell's grant about
improving Motoko's base library.

The main goal of the PR is to introduce a new functional implementation
of the set data structure to the' base' library. Also, it brings a few
changes to the new functional map that was added in #664 , #654 .

# General changes:

* rename `PersistentOrderedMap` to `OrderedMap` (same for the
`OrderedSet`)
* improve docs

# Functional Map changes:

## New functionality:
+ add `any`/`all` functions
+ add `contains` function
+ add `minEntry`/`maxEntry`

## Optimizations:
+ Store `size` in the Map, [benchmark
results](serokell#35)

## Fixup: 
+ add `entriesRev()`, remove `iter()`

# NEW functional Set:

The new data structure implements an ordered set interface using
Red-Black trees as well as the new functional map from the 1-2
Milestones.

## API implemented:
* Basic operations (based on the map): `put`, `delete`, `contains`,
`fromIter`, etc
* Maps and folds: `map`, `mapFilter`, `foldLeft`, `foldRight`
* Set operations: `union` , `intersect`, `diff`, `isSubset`, `equal`
* Additional operations (as for the `OrderedMap`): `min`/`max`,
`all`/`some`

## Maintainance support:
* Unit, property tests
* Documentation

## Applied optimizations:

* Same optimizations that were useful for the functional map:
   * inline node color
   * float-out exceeded matching in iteration
   * `map`/`filterMap` through `foldLeft`
   * direct recursion in `foldLeft`
* [Benchmark results for all four optimizations
together](serokell#27)
* store size in the root of the tree, [benchmark
results](serokell#36 (comment))
* Pattern matching order optimization, [benchmark
results](serokell#36 (comment))
 * Other optimizations:
* Inline code of `OrderedMap` instead of sharing it, [benchmark
results](serokell#25)
* `intersect` optimization: use order of output values to build the
resulting tree faster, see
serokell#39
* `isSubset`, `equal` optimization: use early exit and use order of
subtrees to reduce intermediate tree height, see
serokell#37

## Rejected optimizations:

* Nipkow's implementation of set operation [Tobias Nipkow's "Functional
Data Structures and Algorithms", 117].
Initially, we were planning to use an implementation of set operations
(`intersect`, `union`, `diff`) from Nipkow's book. However, the
experiment shows that naive implementation with a simple size heuristic
performs better. [The benchmark
results](serokell#33) are comparing
3 versions:
* persistentset_baseline -- original implementation that uses Nipkow's
algorithms. However, the black height is calculated before each set
operation (the book assumes it's stored).
* persistentset_bh -- the same as the baseline but the black height is
stored in each node.
* persistentset -- naive implementation that looks up in a smaller set
and modifies a bigger one (it gives us `O(min(n,m)log((max(n,m))` which
is very close to Nipkow's version). Sizes of sets are also stored but
only in the root.
The last one outperforms others and keeps a tree slim in terms of byte
size. Thus, we have picked it.

## Final benchmark results:

### Collection benchmarks

| |binary_size|generate|max mem|batch_get 50|batch_put 50|batch_remove
50|upgrade|
|--:|--:|--:|--:|--:|--:|--:|--:|
|orderedset+100|218_168|186_441|37_916|53_044|121_237|127_460|346_108|
|trieset+100|211_245|574_022|47_652|131_218|288_429|268_499|729_696|

|orderedset+1000|218_168|2_561_296|520_364|69_883|158_349|170_418|3_186_579|

|trieset+1000|211_245|7_374_045|633_440|162_806|383_594|375_264|9_178_466|

|orderedset+10000|218_168|40_015_301|320_532|84_660|192_931|215_592|31_522_120|

|trieset+10000|211_245|105_695_670|682_792|192_931|457_923|462_594|129_453_045|

|orderedset+100000|218_168|476_278_087|3_200_532|98_553|230_123|258_372|409_032_232|

|trieset+100000|211_245|1_234_038_235|6_826_516|222_247|560_440|549_813|1_525_692_388|

|orderedset+1000000|218_168|5_514_198_432|32_000_532|115_836|268_236|306_896|4_090_302_778|

|trieset+1000000|211_245|13_990_048_548|68_228_312|252_211|650_405|642_099|17_455_845_492|

### set API

| |size|intersect|union|diff|equals|isSubset|
|--:|--:|--:|--:|--:|--:|--:|
|orderedset+100|100|146_264|157_544|215_871|28_117|27_726|
|trieset+100|100|352_496|411_306|350_935|201_896|201_456|
|orderedset+1000|1000|162_428|194_198|286_747|242_329|241_938|
|trieset+1000|1000|731_650|1_079_906|912_629|2_589_090|4_023_673|
|orderedset+10000|10000|177_080|231_070|345_529|2_383_587|2_383_591|

|trieset+10000|10000|3_986_854|21_412_306|5_984_106|46_174_710|31_885_381|
|orderedset+100000|100000|190_727|267_008|402_081|91_300_348|91_300_393|

|trieset+100000|100000|178_863_894|209_889_623|199_028_396|521_399_350|521_399_346|

|orderedset+1000000|1000000|205_022|304_937|464_859|912_901_595|912_901_558|

|trieset+1000000|1000000|1_782_977_198|2_092_850_787|1_984_818_266|5_813_335_155|5_813_335_151|

### new set API

| |size|foldLeft|foldRight|mapfilter|map|
|--:|--:|--:|--:|--:|--:|
|orderedset|100|16_487|16_463|88_028|224_597|
|orderedset|1000|133_685|131_953|1_526_510|4_035_782|
|orderedset|10000|1_305_120|1_287_495|28_455_361|51_527_733|
|orderedset|100000|13_041_665|12_849_418|344_132_505|630_692_463|
|orderedset|1000000|130_428_573|803_454_777|4_019_592_041|7_453_944_902|

---------

Co-authored-by: Andrei Borzenkov <[email protected]>
Co-authored-by: Andrei Borzenkov <[email protected]>
Co-authored-by: Sergey Gulin <[email protected]>
Co-authored-by: Claudio Russo <[email protected]>
@crusso
Copy link
Contributor

crusso commented Nov 13, 2024

Superseded by commit 1961fab

@crusso crusso closed this Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants