Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace MEM_MD5_DIGEST with generic MEM_16B_BUF #1874

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/mem/forward.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ typedef void FREE(void *);
/// Types of memory pool which do not yet use MEMPROXY_CLASS() API
typedef enum {
MEM_NONE,
MEM_16B_BUF,
MEM_32B_BUF,
MEM_64B_BUF,
MEM_128B_BUF,
Expand All @@ -53,7 +54,6 @@ typedef enum {
MEM_64K_BUF,
MEM_DREAD_CTRL,
MEM_DWRITE_Q,
MEM_MD5_DIGEST,
MEM_MAX
} mem_type;

Expand Down
18 changes: 15 additions & 3 deletions src/mem/old_api.cc
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#include <iomanip>

/* forward declarations */
static void memFree16B(void *);
static void memFree32B(void *);
static void memFree64B(void *);
static void memFree128B(void *);
Expand Down Expand Up @@ -140,7 +141,10 @@ memFindBufSizeType(size_t net_size, size_t * gross_size)
mem_type type;
size_t size;

if (net_size <= 32) {
if (net_size <= 16) {
type = MEM_16B_BUF;
yadij marked this conversation as resolved.
Show resolved Hide resolved
size = 16;
} else if (net_size <= 32) {
type = MEM_32B_BUF;
size = 32;
} else if (net_size <= 64) {
Expand Down Expand Up @@ -295,6 +299,7 @@ Mem::Init(void)
* that are never used or used only once; perhaps we should simply use
* malloc() for those? @?@
*/
memDataInit(MEM_16B_BUF, "16B Buffer", 16, 10, false);
memDataInit(MEM_32B_BUF, "32B Buffer", 32, 10, false);
memDataInit(MEM_64B_BUF, "64B Buffer", 64, 10, false);
memDataInit(MEM_128B_BUF, "128B Buffer", 128, 10, false);
Expand All @@ -310,8 +315,6 @@ Mem::Init(void)
// TODO: Carefully stop zeroing these objects memory and drop the doZero parameter
memDataInit(MEM_DREAD_CTRL, "dread_ctrl", sizeof(dread_ctrl), 0, true);
memDataInit(MEM_DWRITE_Q, "dwrite_q", sizeof(dwrite_q), 0, true);
memDataInit(MEM_MD5_DIGEST, "MD5 digest", SQUID_MD5_DIGEST_LENGTH, 0, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing smaller generic pools may improve or harm performance. It may harm performance if we happen to have some frequently used context (e.g., in HTTP header parsing) where a short SBuf often grows from below 16 bytes to below 32 bytes -- introducing one extra memory allocation/copy. We should not make such performance-sensitive/focused changes without a pressing need and without performance testing!

I recommend keeping MEM_MD5_DIGEST pool until cache_key is upgraded (as discussed elsewhere). Instead, let's just set its doZero parameter to false so that we can make progress towards removing that pool parameter/functionality as described in the above TODO (line 310/315).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial testing indicates that almost all dynamic buffers start with >1KB allocations (as expected fro I/O buffers), even those which initialize with a 0-byte/missing buffer.

Largest use of these 16B buffers is Store MD5 objects as expected. Plus a few hundred global string constants (eg header names, constants) that are stored in SBuf or MemBuf and do not reallocate to anything larger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial testing indicates that almost all dynamic buffers start with >1KB allocations

Data from two busy production Squids suggests the opposite conclusion (and supports concerns behind this change request): Short strings (i.e. 40 bytes or shorter) are responsible for the vast majority of relevant allocations and exceed, say, 4KB buffer allocations by two orders of magnitude (e.g., 17,267,013 vs. 142,112 allocations).

Here is a sample from one worker showing dominance of small buffer allocations -- allocations that may be sensitive to this PR changes as detailed in this change request:

Pool Obj Size Allocated (#) Allocated (%)
MD5 digest 16 6,618 0%
Short Strings 40 17,267,013 90%
Medium Strings 128 736,958 4%
Long Strings 512 389,430 2%
1KB Strings 1024 2,376 0%
2K Buffer 2048 2,945 0%
4KB Strings 4096 142,112 1%
4K Buffer 4096 371 0%
8K Buffer 8192 151 0%
16KB Strings 16384 454,272 2%
16K Buffer 16384 98 0%
32K Buffer 32768 267 0%
64K Buffer 65536 87,965 0%

Please do not change buffers used for short strings in this PR. We can make progress without such risky and effectively untested performance-sensitive changes (as detailed in this change request).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth noting that the "Short Strings" stats you are looking at were merged into the "64B Buffer" pool (not even 32B pool) in current Squid where my analysis was done. Many of them also were String data-copies by header parsers. Those disappeared when SBuf started share a larger underlying MemBlob I/O buffer and/or reference to the RegisteredHeader global list.

Would the admin of that busy cache you have access to be willing to patch in a counter of how many times memAllocBuf() is called with size under 17 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth noting that the "Short Strings" stats you are looking at were merged into the "64B Buffer" pool (not even 32B pool) in current Squid where my analysis was done.

I am aware that master/v7 code has a different set of pools than v6 and earlier code (due to 2023 commit 250fd42, at least), but those differences do not affect this change request analysis AFAICT.

Many of them also were String data-copies by header parsers. Those disappeared when SBuf started share a larger underlying MemBlob I/O buffer and/or reference to the RegisteredHeader global list.

I do not have enough information to validate the implication that header parsing improvements were enough to make potential negative side effects of this PR changes negligible.

Would the admin of that busy cache you have access to be willing to patch in a counter of how many times memAllocBuf() is called with size under 17 ?

Patching those Squids to investigate this PR side effects is not an option right now. Fortunately, we can make very good progress (including merging this PR!) by focusing on changes leading to doZero parameter (and associated unwanted functionality) removal.

In fact, even if we had enough anecdotal evidence suggesting that current PR changes do not hurt performance, it would still be prudent to extract/isolate those performance-sensitive changes into a dedicated PR!

GetPool(MEM_MD5_DIGEST)->setChunkSize(512 * 1024);

// Test that all entries are initialized
for (auto t = MEM_NONE; ++t < MEM_MAX;) {
Expand Down Expand Up @@ -350,6 +353,12 @@ memInUse(mem_type type)

/* ick */

void
memFree16B(void *p)
{
memFree(p, MEM_16B_BUF);
}

void
memFree32B(void *p)
{
Expand Down Expand Up @@ -433,6 +442,9 @@ memFreeBufFunc(size_t size)
{
switch (size) {

case 16:
return memFree16B;

case 32:
return memFree32B;

Expand Down
4 changes: 2 additions & 2 deletions src/store_key_md5.cc
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ storeKeyPublicByRequestMethod(HttpRequest * request, const HttpRequestMethod& me
cache_key *
storeKeyDup(const cache_key * key)
{
cache_key *dup = (cache_key *)memAllocate(MEM_MD5_DIGEST);
cache_key *dup = (cache_key *)memAllocBuf(SQUID_MD5_DIGEST_LENGTH, nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEM_MD5_DIGEST pool should be removed, but it should not be replaced with another pool. Instead, cache_key should not be dynamically allocated at all! The key should become a cheap-to-create/copy/compare/destroy class based on something like two uint64_t integer data members. We may even have an old TODO about this long-awaited improvement somewhere...

I am not blocking this PR on these "wrong MEM_MD5_DIGEST replacement" grounds because there may be performance value in introducing a generic 16-byte pool (as discussed elsewhere).

memcpy(dup, key, SQUID_MD5_DIGEST_LENGTH);
return dup;
}
Expand All @@ -153,7 +153,7 @@ storeKeyCopy(cache_key * dst, const cache_key * src)
void
storeKeyFree(const cache_key * key)
{
memFree((void *) key, MEM_MD5_DIGEST);
memFreeBuf(SQUID_MD5_DIGEST_LENGTH, const_cast<cache_key *>(key));
}

int
Expand Down
Loading