[ performance ] Change the order of derivation, derive one strictly after another #85

buzden · 2023-09-12T17:51:48Z

Since this should reduce the monadic depth during derivation, this should increase the performance of the derivation

buzden · 2024-02-09T09:24:58Z

Interestingly, this change gives non-trivial results in performance.

Agenda of metic results below: <user-time in sec> <wall-time> <memory>.

E.g., on the sorted-trees example on the same machine on the compiler c74f54c1222123906616dcf51a9aba5ef0b9b4e9

on the vanilla GC
- on the current master
```
? 21:01 3896056 kb
? 21:05 3896204 kb
```
- on this PR branch
```
? 19:50 2004444 kb
? 19:48 2004276 kb
```
- results: this PR gains ~6% in time and 49% in memory when using vanilla GC
on the patched GC
- on the current master
```
624 10:41 5386508 kb
624 10:41 5386580 kb
625 10:41 5386492 kb
626 10:42 5386532 kb
624 10:40 5386732 kb
```
  i.e. 50% gain in time and 38% loss in memory comparing to vanilla GC
- on this PR branch
```
788 13:31 2662632 kb
787 13:29 2662644 kb
787 13:30 2662564 kb
789 13:32 2662628 kb
789 13:31 2662828 kb
```
  i.e. 32% gain in time and 32% loss in memory comparing to vanilla GC
- results: this PR loses 26% in time and gains 51% in memory when using patched GC

So, this PR in this particular example

always uses 1/2 memory comparing to the current master;
executes either slightly faster or 1/4 slower (comparing to the current master) depending on which GC is used;
patching GC always gives significant time gain and using 1/3 more memory (both on master and on this PR).

At the same time, we can see gain in time using this PR even on the patched GC on another, bigger example, but we haven't got precise numbers yet.

buzden · 2024-02-09T13:44:10Z

To unpedictability of the evaluator's performance, if you replace

-  lg startMark
-  r <- action
-  lg endMark
-  pure r
+  lg startMark *> action <* lg endMark

in the logging mechanism, you still get degradation of the performance:

With <user-time> <wall-time> <memory-in-kb> on master we have

625 10:42 5386608
623 10:39 5386384
622 10:39 5386604

with applied diff above we have

638 10:55 5576780
635 10:52 5577004
637 10:53 5577012

on the sorted-trees example, i.e. slightly more time (~2%) and slightly more memory (~4%). Curious.

buzden · 2024-02-12T21:50:19Z

For the record, on the big-big model on the patched GC results are the following:

master:
```
14378 4:15:13 10926616 kb
```
this PR:
```
13753 4:04:44 5895596 kb
```

i.e. this PR 4% faster and uses 85% lesser memory.

buzden · 2024-02-14T17:28:10Z

It is believed, that breath-first traverse of the data tree is preferrable to the depth-first traverse in the current evaluator because depth is associated with monadic binds in the Elab monad, which, in its turn, is associated with lots of closures, which are badly managed with the current compiler. Yaffle-based evaluator should not need this type of change, and as soon as it is implemented in the main Idris compiler, this change could be reverted.

buzden added code: enhancement New feature or improvement part: derivation Related to automated derivation of generators issue: performance When work takes too much resources derive: core Something in between single type generator and its constructors labels Sep 12, 2023

buzden force-pushed the queued-closuring branch 2 times, most recently from 9c3561a to 5d3a4ba Compare September 14, 2023 12:52

buzden force-pushed the queued-closuring branch 2 times, most recently from cfb8fc7 to 6edf3fc Compare October 31, 2023 19:50

buzden mentioned this pull request Feb 8, 2024

[ refactor ] Make hasNamesInsideDeep pure, precompute needed info #108

Merged

buzden force-pushed the queued-closuring branch 2 times, most recently from a4cfbb8 to be229cb Compare February 8, 2024 17:25

buzden added 4 commits February 8, 2024 20:29

[ refactor ] Prepare to queuing

ffa2386

[ refactor ] Use queue instead of call stack to order the derivation

0079cf3

[ fix ] Do not start loop too much

58dcb11

[ test ] Fix print golden values

d703c80

buzden force-pushed the queued-closuring branch from be229cb to d703c80 Compare February 8, 2024 17:34

buzden added the status: upstream Upstream issue label Feb 14, 2024

buzden merged commit cef75b1 into master Feb 14, 2024
26 checks passed

buzden deleted the queued-closuring branch February 14, 2024 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ performance ] Change the order of derivation, derive one strictly after another #85

[ performance ] Change the order of derivation, derive one strictly after another #85

buzden commented Sep 12, 2023 •

edited

Loading

buzden commented Feb 9, 2024 •

edited

Loading

buzden commented Feb 9, 2024

buzden commented Feb 12, 2024

buzden commented Feb 14, 2024

[ performance ] Change the order of derivation, derive one strictly after another #85

[ performance ] Change the order of derivation, derive one strictly after another #85

Conversation

buzden commented Sep 12, 2023 • edited Loading

buzden commented Feb 9, 2024 • edited Loading

buzden commented Feb 9, 2024

buzden commented Feb 12, 2024

buzden commented Feb 14, 2024

buzden commented Sep 12, 2023 •

edited

Loading

buzden commented Feb 9, 2024 •

edited

Loading