Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Past and future ES index sizes #231

Open
philbudne opened this issue Feb 2, 2024 · 0 comments
Open

Past and future ES index sizes #231

philbudne opened this issue Feb 2, 2024 · 0 comments
Labels
documentation Improvements or additions to documentation elasticsearch

Comments

@philbudne
Copy link
Contributor

For the run of the initial ES indices from 2023-10-18 through 2024-02-02:

(venv) pbudne@ramos:~/arch$ curl -s 'http://localhost:9204/_cat/indices?bytes=b'              
green open mediacloud_search_text_older BBSu3r2EQIGs6AAl49TBOA 30 1   753246 0  11984353239   5982772227
green open mediacloud_search_text_2024  VriWuMRjRO-0jnHeDf6rPQ 30 1 10243668 0 172018298458  85971967853
green open mediacloud_search_text_2023  Lt9PjU_LSYq1lT-I63NWBg 30 1 26236326 0 400916888782 200450787220
green open mediacloud_search_text_other 5jCe2RVAR5qd1OWBzhGTNw 30 1  3921892 0  84955802854  42427619599
green open mediacloud_search_text_2021  eqMu6wbrTIamy1xl2y60yg 30 1   267522 0   3397433000   1696415243
green open mediacloud_search_text_2022  a_6uKHnCSbu86iqZCcw7cA 30 1   325098 0   4793137085   2395040749
(venv) pbudne@ramos:~/arch$ curl -s 'http://localhost:9204/_cat/indices?bytes=b' | awk '{ n+= $7; b += $10} END { print n, b, b/n}'
41747752 338924602891 8118.39

That is: about 42e6 documents, taking about 339e9 bytes, for an average of 8118 bytes/document, in about 107 days.

I've posted this for posterity, and discussion of ILM max index size/age for rollover: do we want to have a target index size so that we roll over to a new index (and archive the old one) more frequently than once a year, and reduce the shard count accordingly?

For reference, ES docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html
the summary seems to be: "Aim for shards of up to 200M documents, or with sizes between 10GB and 50GB"

Our about to be current ILM settings are:
30 shards per index
ILM max shard size 50GB
ILM max age 365 days

@rahulbot rahulbot added documentation Improvements or additions to documentation elasticsearch labels Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation elasticsearch
Projects
None yet
Development

No branches or pull requests

2 participants