Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mycpp: add 128B pool #1958

Open
wants to merge 3 commits into
base: soil-staging
Choose a base branch
from
Open

mycpp: add 128B pool #1958

wants to merge 3 commits into from

Conversation

melvinw
Copy link
Collaborator

@melvinw melvinw commented May 7, 2024

This improves performances on memory-bound workloads and helps reudce page faults in workloads like CPython configure.

This improves performances on memory-bound workloads.
@melvinw melvinw requested a review from andychu May 7, 2024 02:49
@melvinw melvinw changed the base branch from master to soil-staging May 7, 2024 02:53
@@ -265,10 +265,12 @@ class MarkSweepHeap {
#ifndef NO_POOL_ALLOC
// 16,384 / 24 bytes = 682 cells (rounded), 16,368 bytes
// 16,384 / 48 bytes = 341 cells (rounded), 16,368 bytes
// 16,384 / 96 bytes = 171 cells (rounded), 16,368 bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm this still says 96

I kinda want to do some evaluation, make sure it's not over-tuned for Linux / glibc / 64-bit

And does this waste memory on any workloads?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think we had the theory again that malloc() needed some slack, to be strictly less than 16,384 bytes, to allow for headers

But I guess we never measured that effect

We already run on many different libc with different malloc(), so yeah I think we can at least survey Alpine Linux

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh whoops. That's a typo leftover from a previous version. Will fix.

Happy to leave this open until we have a better idea of the effect with different allocators. I suspect that it will be a win in most cases, though. Since the write to the allocation header (whether it's below the buffer or elsewhere) is going to cause page faults (the reduction of which was the main win from this) regardless.

@andychu
Copy link
Contributor

andychu commented May 7, 2024

OK if leaving this open isn't blocking anything, it will be a good opportunity to test out my new performance harness

It bugged me that the CI is not good enough to really measure performance -- we need other hardware and other systems


I think a lot of projects actually have real hardware labs for this, e.g. the Go language has a bunch of machines all over. But we can probably do pretty good on a small budget :)

@andychu
Copy link
Contributor

andychu commented Jun 1, 2024

BTW I still want to merge this, but I want to use it as an example for some perf evaluation on different machines with different libc allocators

@andychu andychu deleted the branch soil-staging October 1, 2024 04:33
@andychu andychu closed this Oct 1, 2024
@andychu
Copy link
Contributor

andychu commented Nov 12, 2024

Oh this was accidentally closed because I deleted the soil-staging branch

@andychu andychu reopened this Nov 12, 2024
@andychu
Copy link
Contributor

andychu commented Nov 12, 2024

OK it merged cleanly, so I pushed it to CI again

I was thinking about the pools again, but I don't remember why

Oh yeah it was because Aidan and I were investigating a word splitting bug. And I was thinking about how to do fewer allocs in word splitting, which is hot

And it felt like we needed more 32-byte Slices, like our 32-byte Tokens. I was wondering if we make it 24,32,48, or 24,48,128, etc.


i.e. there is still a "hot" part of the code that isn't "done" yet, because there are IFS bugs


I also felt like this depends a lot on the specific machine. e.g. a 128 byte pool could be more of a de-optimization for a 32-bit machine. Although those are getting rarer and rarer

@andychu
Copy link
Contributor

andychu commented Nov 12, 2024

On the cachegrind benchmarks, this does help a little

http://op.oilshell.org/uuu/github-jobs/8247/benchmarks2.wwz/_tmp/gc-cachegrind/index.html

http://op.oilshell.org/uuu/github-jobs/8249/benchmarks2.wwz/_tmp/gc-cachegrind/index.html

parsing

41.0 	_bin/cxx-opt/osh 	mut+alloc+free+gc 

to

40.3 	_bin/cxx-opt/osh 	mut+alloc+free+gc 

fibonacci

 24.2 	_bin/cxx-opt/osh 	mut+alloc+free+gc 

and

24.0 	_bin/cxx-opt/osh 	mut+alloc+free+gc 

I guess one reason I've been hesistant is that we can be using more memory, and not notice it on the benchmarks

It is common to trade speed for memory, and then memory doesn't get measured


I hope we can do the IFS fixes, and then see where word eval is at. Actually I checked in some word eval benchmarks the other day:

https://oilshell.zulipchat.com/#narrow/channel/121539-oil-dev/topic/word.20split.20benchmarks

we're actually faster than bash! But slower than other shells

Word eval is known to be hot, and speeding it up even sped up CPython configure as far as I remember. So I hope we can get that in, and then compare say 24/32/48 vs 24/48/128

Although we will still have the memory issue then, and it's still true that we can be over-fitting for certain hardware

@andychu
Copy link
Contributor

andychu commented Nov 12, 2024

Also, I could be missing something in terms of the benchmarks, since there are so many dimensions and workloads

I'm sure there are other ways to look at this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants