-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace MEM_MD5_DIGEST with generic MEM_16B_BUF #1874
base: master
Are you sure you want to change the base?
Conversation
... with generic 16 byte buffer.
@@ -138,7 +138,7 @@ storeKeyPublicByRequestMethod(HttpRequest * request, const HttpRequestMethod& me | |||
cache_key * | |||
storeKeyDup(const cache_key * key) | |||
{ | |||
cache_key *dup = (cache_key *)memAllocate(MEM_MD5_DIGEST); | |||
cache_key *dup = (cache_key *)memAllocBuf(SQUID_MD5_DIGEST_LENGTH, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MEM_MD5_DIGEST pool should be removed, but it should not be replaced with another pool. Instead, cache_key should not be dynamically allocated at all! The key should become a cheap-to-create/copy/compare/destroy class based on something like two uint64_t integer data members. We may even have an old TODO about this long-awaited improvement somewhere...
I am not blocking this PR on these "wrong MEM_MD5_DIGEST replacement" grounds because there may be performance value in introducing a generic 16-byte pool (as discussed elsewhere).
@@ -310,8 +315,6 @@ Mem::Init(void) | |||
// TODO: Carefully stop zeroing these objects memory and drop the doZero parameter | |||
memDataInit(MEM_DREAD_CTRL, "dread_ctrl", sizeof(dread_ctrl), 0, true); | |||
memDataInit(MEM_DWRITE_Q, "dwrite_q", sizeof(dwrite_q), 0, true); | |||
memDataInit(MEM_MD5_DIGEST, "MD5 digest", SQUID_MD5_DIGEST_LENGTH, 0, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing smaller generic pools may improve or harm performance. It may harm performance if we happen to have some frequently used context (e.g., in HTTP header parsing) where a short SBuf often grows from below 16 bytes to below 32 bytes -- introducing one extra memory allocation/copy. We should not make such performance-sensitive/focused changes without a pressing need and without performance testing!
I recommend keeping MEM_MD5_DIGEST pool until cache_key is upgraded (as discussed elsewhere). Instead, let's just set its doZero parameter to false so that we can make progress towards removing that pool parameter/functionality as described in the above TODO (line 310/315).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial testing indicates that almost all dynamic buffers start with >1KB allocations (as expected fro I/O buffers), even those which initialize with a 0-byte/missing buffer.
Largest use of these 16B buffers is Store MD5 objects as expected. Plus a few hundred global string constants (eg header names, constants) that are stored in SBuf or MemBuf and do not reallocate to anything larger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial testing indicates that almost all dynamic buffers start with >1KB allocations
Data from two busy production Squids suggests the opposite conclusion (and supports concerns behind this change request): Short strings (i.e. 40 bytes or shorter) are responsible for the vast majority of relevant allocations and exceed, say, 4KB buffer allocations by two orders of magnitude (e.g., 17,267,013 vs. 142,112 allocations).
Here is a sample from one worker showing dominance of small buffer allocations -- allocations that may be sensitive to this PR changes as detailed in this change request:
Pool | Obj Size | Allocated (#) | Allocated (%) |
---|---|---|---|
MD5 digest | 16 | 6,618 | 0% |
Short Strings | 40 | 17,267,013 | 90% |
Medium Strings | 128 | 736,958 | 4% |
Long Strings | 512 | 389,430 | 2% |
1KB Strings | 1024 | 2,376 | 0% |
2K Buffer | 2048 | 2,945 | 0% |
4KB Strings | 4096 | 142,112 | 1% |
4K Buffer | 4096 | 371 | 0% |
8K Buffer | 8192 | 151 | 0% |
16KB Strings | 16384 | 454,272 | 2% |
16K Buffer | 16384 | 98 | 0% |
32K Buffer | 32768 | 267 | 0% |
64K Buffer | 65536 | 87,965 | 0% |
Please do not change buffers used for short strings in this PR. We can make progress without such risky and effectively untested performance-sensitive changes (as detailed in this change request).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth noting that the "Short Strings" stats you are looking at were merged into the "64B Buffer" pool (not even 32B pool) in current Squid where my analysis was done. Many of them also were String
data-copies by header parsers. Those disappeared when SBuf
started share a larger underlying MemBlob
I/O buffer and/or reference to the RegisteredHeader
global list.
Would the admin of that busy cache you have access to be willing to patch in a counter of how many times memAllocBuf()
is called with size under 17
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth noting that the "Short Strings" stats you are looking at were merged into the "64B Buffer" pool (not even 32B pool) in current Squid where my analysis was done.
I am aware that master/v7 code has a different set of pools than v6 and earlier code (due to 2023 commit 250fd42, at least), but those differences do not affect this change request analysis AFAICT.
Many of them also were
String
data-copies by header parsers. Those disappeared whenSBuf
started share a larger underlyingMemBlob
I/O buffer and/or reference to theRegisteredHeader
global list.
I do not have enough information to validate the implication that header parsing improvements were enough to make potential negative side effects of this PR changes negligible.
Would the admin of that busy cache you have access to be willing to patch in a counter of how many times
memAllocBuf()
is called with size under17
?
Patching those Squids to investigate this PR side effects is not an option right now. Fortunately, we can make very good progress (including merging this PR!) by focusing on changes leading to doZero parameter (and associated unwanted functionality) removal.
In fact, even if we had enough anecdotal evidence suggesting that current PR changes do not hurt performance, it would still be prudent to extract/isolate those performance-sensitive changes into a dedicated PR!
FWIW, I do not know why that label was added, but I recommend resolving PR merge conflicts as the next step towards all-green CI tests. |
No description provided.