You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the run of the initial ES indices from 2023-10-18 through 2024-02-02:
(venv) pbudne@ramos:~/arch$ curl -s 'http://localhost:9204/_cat/indices?bytes=b'
green open mediacloud_search_text_older BBSu3r2EQIGs6AAl49TBOA 30 1 753246 0 11984353239 5982772227
green open mediacloud_search_text_2024 VriWuMRjRO-0jnHeDf6rPQ 30 1 10243668 0 172018298458 85971967853
green open mediacloud_search_text_2023 Lt9PjU_LSYq1lT-I63NWBg 30 1 26236326 0 400916888782 200450787220
green open mediacloud_search_text_other 5jCe2RVAR5qd1OWBzhGTNw 30 1 3921892 0 84955802854 42427619599
green open mediacloud_search_text_2021 eqMu6wbrTIamy1xl2y60yg 30 1 267522 0 3397433000 1696415243
green open mediacloud_search_text_2022 a_6uKHnCSbu86iqZCcw7cA 30 1 325098 0 4793137085 2395040749
(venv) pbudne@ramos:~/arch$ curl -s 'http://localhost:9204/_cat/indices?bytes=b' | awk '{ n+= $7; b += $10} END { print n, b, b/n}'
41747752 338924602891 8118.39
That is: about 42e6 documents, taking about 339e9 bytes, for an average of 8118 bytes/document, in about 107 days.
I've posted this for posterity, and discussion of ILM max index size/age for rollover: do we want to have a target index size so that we roll over to a new index (and archive the old one) more frequently than once a year, and reduce the shard count accordingly?
For the run of the initial ES indices from 2023-10-18 through 2024-02-02:
That is: about 42e6 documents, taking about 339e9 bytes, for an average of 8118 bytes/document, in about 107 days.
I've posted this for posterity, and discussion of ILM max index size/age for rollover: do we want to have a target index size so that we roll over to a new index (and archive the old one) more frequently than once a year, and reduce the shard count accordingly?
For reference, ES docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html
the summary seems to be: "Aim for shards of up to 200M documents, or with sizes between 10GB and 50GB"
Our about to be current ILM settings are:
30 shards per index
ILM max shard size 50GB
ILM max age 365 days
The text was updated successfully, but these errors were encountered: