Slow writer performance with the current default heap size #118

cjrh · 2023-09-09T12:10:03Z

@adamreichold Circling back to this discussion.

While upgrading another application to use current head tantivy-py, I am finding that the default heap limit of 3000000 seems to cause very frequent commits while adding documents. It just doesn't seem large enough. I can improve performance by increasing the heap size, but I'm thinking the current default is going cause surprisingly poor performance for a lot of people once they upgrade.

What are your thoughts on this? Is there a more typical "good" value to use as a default? I am not familiar with the tantivy work between 0.19.2 and 0.20.1 that led to this apparent change in behaviour.

cjrh · 2023-09-09T12:12:34Z

The tantivy docs for the writer settings don't describe the consequences of setting the heap larger or smaller. I'd be happy to make improvements to those docs once I understand those consequences myself ;)

cjrh · 2023-09-09T12:19:01Z

Based on reading some threads on discord, is this the same setting on quickwit, that is currently default to 2GB? https://quickwit.io/docs/configuration/index-config#indexer-memory-usage

adamreichold · 2023-09-09T12:26:41Z

Please have a look at the thread over at quickwit-oss/tantivy#2156 (comment)

The main point is that the memory accounting got more accurate, meaning the indexer used to use more memory than configured per the buffer limit. Now it is much closer to staying within that limit but this also means that the same nominal limit implies less buffering and more commits which is what you are experiencing.

I think the main thing here is that the Rust bindings force one to make a choice via the mandatory memory_arena_num_bytes parameter whereas the Python supply basically a minimum value as the default value. So indeed I think it would make sense to decrease this significantly to a reasonable default like 128 MB or even 1 GB. In addition we should document though, the an actually helpful value needs to be measured as it depends on the schema and the data.

(Additionally, I think the actual memory consumption has somewhat increased due to the new columnar fast field storage. But whether this really affects a given use case also depends on the schema and data in question.)

cjrh assigned adamreichold and cjrh Sep 9, 2023

cjrh added a commit to cjrh/tantivy-py that referenced this issue Sep 9, 2023

Increase default writer heap (fixes quickwit-oss#118)

c6bd959

cjrh closed this as completed in 91a422b Sep 10, 2023

wallies mentioned this issue Oct 20, 2024

OutOfMemory exception with low system memory usage #359

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow writer performance with the current default heap size #118

Slow writer performance with the current default heap size #118

cjrh commented Sep 9, 2023

cjrh commented Sep 9, 2023

cjrh commented Sep 9, 2023

adamreichold commented Sep 9, 2023

Slow writer performance with the current default heap size #118

Slow writer performance with the current default heap size #118

Comments

cjrh commented Sep 9, 2023

cjrh commented Sep 9, 2023

cjrh commented Sep 9, 2023

adamreichold commented Sep 9, 2023