Skip to content

Prefetch PostingList #133009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Conversation

john-wagster
Copy link
Contributor

@john-wagster john-wagster commented Aug 15, 2025

Exploring prefetching the next posting list in the context of low memory scenarios. We get a good speed up with this approach in low memory scenarios.

Had to tweak how I was testing a good bit. Ultimately something akin to this that limits container ram and swap. 550m seemed to be the limit where a java command would run and 450mb was about the smallest heap given dbpedia 1m as the dataset:

docker stub command
docker run -v elasticsearch:/elasticsearch -v data:/data -v .gradle:/root/.gradle --name dev --workdir /elasticsearch --memory="550m" --memory-swap="550m" -it --rm openjdk:24-jdk-slim-bookworm ...

Here's a couple of the runs for comparison of with and without prefetch:

Results
# w prefetch - 550MB memory, 450MB heap
index_name                             index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
-------------------------------------  ----------  --------  --------------  --------------------  ------------  
corpus-dbpedia-entity-E5-small-0.fvec         ivf   1000000          160092                364439             1

index_name                             index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall     visited  filter_selectivity
-------------------------------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  ----------  ------------------  
corpus-dbpedia-entity-E5-small-0.fvec         ivf                 1.00         3.75              0.00           0.00  266.84    0.72    20392.18                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf                 5.00        12.61              0.00           0.00   79.29    0.84   100392.32                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf                10.00        19.83              0.00           0.00   50.44    0.88   200424.90                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf                30.00        49.40              0.00           0.00   20.24    0.93   600396.79                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf                50.00        67.49              0.00           0.00   14.82    0.95  1000389.70                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf                70.00        91.00              0.00           0.00   10.99    0.96  1400380.10                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf               100.00       130.78              0.00           0.00    7.65    0.96  1999730.74                1.00

# w/o prefetch - 550MB memory, 450MB heap
index_name                             index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
-------------------------------------  ----------  --------  --------------  --------------------  ------------  
corpus-dbpedia-entity-E5-small-0.fvec         ivf   1000000          176300                509111             1

index_name                             index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall     visited  filter_selectivity
-------------------------------------  ----------  -------------------  -----------  ----------------  -------------  ------  ------  ----------  ------------------  
corpus-dbpedia-entity-E5-small-0.fvec         ivf                 1.00         9.16              0.00           0.00  109.11    0.71    20389.41                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf                 5.00        20.38              0.00           0.00   49.07    0.83   100409.05                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf                10.00        33.85              0.00           0.00   29.54    0.87   200389.46                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf                30.00        95.05              0.00           0.00   10.52    0.93   600373.26                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf                50.00       142.15              0.00           0.00    7.03    0.95  1000402.42                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf                70.00       198.86              0.00           0.00    5.03    0.96  1400387.26                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf               100.00       289.33              0.00           0.00    3.46    0.96  1999792.38                1.00

@john-wagster john-wagster marked this pull request as ready for review August 18, 2025 05:28
@john-wagster john-wagster removed the WIP label Aug 18, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@john-wagster john-wagster changed the title Prefetch POC Prefetch PostingList Aug 18, 2025
@john-wagster john-wagster requested a review from iverase August 18, 2025 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants