Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize kiwix-serve cache settings to limit memory consumption #147

Open
benoit74 opened this issue Nov 24, 2023 · 3 comments
Open

Customize kiwix-serve cache settings to limit memory consumption #147

benoit74 opened this issue Nov 24, 2023 · 3 comments
Labels
question Further information is requested

Comments

@benoit74
Copy link
Collaborator

As of today, kiwix-serve cache settings are not customized on library.kiwix.org (and not on dev.library.kiwix.org)

As discussed in kiwix/libkiwix#1025, kiwix-serve is using a significant amount of memory. With current code, we could probably put more control on this memory consumption by customizing some settings explained below

Environment variable Purpose Default value Comment
KIWIX_ARCHIVE_CACHE_SIZE Number of open readers (~ZIM) 10% of getBookCount_not_protected (number of local and remote books) ~= 421 today
KIWIX_SEARCHER_CACHE_SIZE Number of open searcher (which might include readers non accounted for in KIWIX_ARCHIVE_CACHE_SIZE) idem KIWIX_ARCHIVE_CACHE_SIZE ~= 421 today
ZIM_DIRENTCACHE Number of dirent kept in cache per ZIM 512 Probably low impact on memory
ZIM_DIRENTLOOKUPCACHE Idem ZIM_DIRENTCACHE 1024 Probably low impact on memory
ZIM_CLUSTERCACHE Number of cluster kept in cache per ZIM 16

My gut feeling is that 412 for KIWIX_ARCHIVE_CACHE_SIZE and KIWIX_SEARCHER_CACHE_SIZE is way too much, I wouldn't assume we open this amount of ZIM every day, but my experience is limited.

I suggest that we do a small experiments directly in production on library.kiwix.org (dev.library.kiwix.org is not really pertinent in terms of number of ZIMs + traffic and has known issues):

  • instead of having 2 kiwix-serve containers in deployment library-data, reduce this to 1
  • create a new deployment library-data-expe, with 1 kiwix-serve container and custom environment variables
  • modify library-data service to redirect to both k8s deployment
    • this is easy to do (easier than at varnish side)
    • should we encounter a problem, we just scale library-data to 2 containers and library-data-expe to 0 et voilà
  • for every experiment, let the system stabilize for at least 3 days (tbc based on observations) and compare library-data and library-data-expe in terms of memory, CPU and Disk/IO ; also note any sensible change in terms of performance on live browsing by an end-user
  • start with simple experiments:
    • first, set all default values (just to confirm that our expected default values are correct and there is no bias)
    • then, for every setting above, divide their value by 2 (one setting at a time, and all other settings are at their default value)

@rgaudin @mgautierfr @kelson42 WDYT?

@benoit74 benoit74 added the question Further information is requested label Nov 24, 2023
@rgaudin
Copy link
Member

rgaudin commented Nov 24, 2023

What's the RAM impact of each of those cached entry? You suspect 421 is too large but how much data is cached for each? Is it a static figure? Is it dynamic (based on usage)?

I know @mgautierfr has already explained this but I don't think it's documented and probably should (in libkiwix wiki?)

@benoit74
Copy link
Collaborator Author

I just created a good dashboard to observe all system metrics of a given set of pods (based on a regex of their name + a regex that must not match their names):

https://kiwixorg.grafana.net/d/eaa1add43ccec1e85a562078cdf77779/7589a2da-9d76-59e4-9c5a-58399ebf4adf?orgId=1&refresh=30s

@mgautierfr
Copy link
Member

Is it a static figure? Is it dynamic (based on usage)?

It is dynamic.
When a user open a page in a zim file, we will cache:

  • ZIM_DIRENTLOOKUPCACHE dirents at zim file opening.
  • Up to ZIM_DIRENTCACHE dirents used to find the requested resources (dirents of the resources + dirents used to do binary search)
  • Up to ZIM_CLUSTERCACHE clusters (clusters of the resources)

So the more pages are read, the more we cache things.
All ZIM_*CACHE are related to libzim and so are per opened zim file.

On top of that, libkiwix it self cache zim readers so we have to multiply all this number by the number of cached readers (up to KIWIX_ARCHIVE_CACHE_SIZE)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants