Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ performance ] Change the order of derivation, derive one strictly after another #85

Merged
merged 4 commits into from
Feb 14, 2024

Conversation

buzden
Copy link
Owner

@buzden buzden commented Sep 12, 2023

Since this should reduce the monadic depth during derivation, this should increase the performance of the derivation

@buzden buzden added code: enhancement New feature or improvement part: derivation Related to automated derivation of generators issue: performance When work takes too much resources derive: core Something in between single type generator and its constructors labels Sep 12, 2023
@buzden buzden force-pushed the queued-closuring branch 2 times, most recently from 9c3561a to 5d3a4ba Compare September 14, 2023 12:52
@buzden buzden force-pushed the queued-closuring branch 2 times, most recently from cfb8fc7 to 6edf3fc Compare October 31, 2023 19:50
@buzden buzden force-pushed the queued-closuring branch 2 times, most recently from a4cfbb8 to be229cb Compare February 8, 2024 17:25
@buzden
Copy link
Owner Author

buzden commented Feb 9, 2024

Interestingly, this change gives non-trivial results in performance.

Agenda of metic results below: <user-time in sec> <wall-time> <memory>.

E.g., on the sorted-trees example on the same machine on the compiler c74f54c1222123906616dcf51a9aba5ef0b9b4e9

  • on the vanilla GC

    • on the current master

      ? 21:01 3896056 kb
      ? 21:05 3896204 kb
      
    • on this PR branch

      ? 19:50 2004444 kb
      ? 19:48 2004276 kb
      
    • results: this PR gains ~6% in time and 49% in memory when using vanilla GC

  • on the patched GC

    • on the current master

      624 10:41 5386508 kb
      624 10:41 5386580 kb
      625 10:41 5386492 kb
      626 10:42 5386532 kb
      624 10:40 5386732 kb
      

      i.e. 50% gain in time and 38% loss in memory comparing to vanilla GC

    • on this PR branch

      788 13:31 2662632 kb
      787 13:29 2662644 kb
      787 13:30 2662564 kb
      789 13:32 2662628 kb
      789 13:31 2662828 kb
      

      i.e. 32% gain in time and 32% loss in memory comparing to vanilla GC

    • results: this PR loses 26% in time and gains 51% in memory when using patched GC

So, this PR in this particular example

  • always uses 1/2 memory comparing to the current master;
  • executes either slightly faster or 1/4 slower (comparing to the current master) depending on which GC is used;
  • patching GC always gives significant time gain and using 1/3 more memory (both on master and on this PR).

At the same time, we can see gain in time using this PR even on the patched GC on another, bigger example, but we haven't got precise numbers yet.

@buzden
Copy link
Owner Author

buzden commented Feb 9, 2024

To unpedictability of the evaluator's performance, if you replace

-  lg startMark
-  r <- action
-  lg endMark
-  pure r
+  lg startMark *> action <* lg endMark

in the logging mechanism, you still get degradation of the performance:

With <user-time> <wall-time> <memory-in-kb> on master we have

625 10:42 5386608
623 10:39 5386384
622 10:39 5386604

with applied diff above we have

638 10:55 5576780
635 10:52 5577004
637 10:53 5577012

on the sorted-trees example, i.e. slightly more time (~2%) and slightly more memory (~4%). Curious.

@buzden
Copy link
Owner Author

buzden commented Feb 12, 2024

For the record, on the big-big model on the patched GC results are the following:

  • master:
    14378 4:15:13 10926616 kb
    
  • this PR:
    13753 4:04:44 5895596 kb
    

i.e. this PR 4% faster and uses 85% lesser memory.

@buzden buzden added the status: upstream Upstream issue label Feb 14, 2024
@buzden
Copy link
Owner Author

buzden commented Feb 14, 2024

It is believed, that breath-first traverse of the data tree is preferrable to the depth-first traverse in the current evaluator because depth is associated with monadic binds in the Elab monad, which, in its turn, is associated with lots of closures, which are badly managed with the current compiler. Yaffle-based evaluator should not need this type of change, and as soon as it is implemented in the main Idris compiler, this change could be reverted.

@buzden buzden merged commit cef75b1 into master Feb 14, 2024
26 checks passed
@buzden buzden deleted the queued-closuring branch February 14, 2024 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code: enhancement New feature or improvement derive: core Something in between single type generator and its constructors issue: performance When work takes too much resources part: derivation Related to automated derivation of generators status: upstream Upstream issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant