Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add concurrent segment search follow-up blog and concurrent search nightly benchmark dashboards #3031

Merged
merged 7 commits into from
Jul 30, 2024

Conversation

jed326
Copy link
Contributor

@jed326 jed326 commented Jun 28, 2024

Description

Add concurrent segment search follow-up blog and concurrent search nightly benchmark dashboards

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

@jed326
Copy link
Contributor Author

jed326 commented Jun 28, 2024

@kolchfa-aws could you please help review this? Thanks!

@kolchfa-aws kolchfa-aws self-assigned this Jun 28, 2024
@pajuric pajuric self-assigned this Jul 1, 2024
@pajuric
Copy link

pajuric commented Jul 1, 2024

@jed326 - Hi Jay, thanks for contributing a blog. As the blog manager, I ask that you not directly reach out to Fanit to review your blog, and instead follow the process in the Slack that I sent to you earlier today. It outlines the process.

_posts/2024-06-30-concurrent-search-follow-up.md Outdated Show resolved Hide resolved
_posts/2024-06-30-concurrent-search-follow-up.md Outdated Show resolved Hide resolved
_posts/2024-06-30-concurrent-search-follow-up.md Outdated Show resolved Hide resolved
_posts/2024-06-30-concurrent-search-follow-up.md Outdated Show resolved Hide resolved
_posts/2024-06-30-concurrent-search-follow-up.md Outdated Show resolved Hide resolved
_posts/2024-06-30-concurrent-search-follow-up.md Outdated Show resolved Hide resolved
_posts/2024-06-30-concurrent-search-follow-up.md Outdated Show resolved Hide resolved
_posts/2024-06-30-concurrent-search-follow-up.md Outdated Show resolved Hide resolved
jed326 and others added 2 commits July 26, 2024 15:43
Signed-off-by: Jay Deng <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>

In concurrent segment search, each shard-level search request on a node is divided into multiple execution tasks called slices. Slices can be executed concurrently on separate threads in the index_searcher threadpool, separate from the search threadpool. Each slice searches within the segments associated with it. Once all slice executions are complete, the collected results from all slices are combined (reduced) and returned to the coordinator node. The index_searcher threadpool is used to execute the slices of each shard search request and is shared across all shard search requests on a node. By default, the index_searcher threadpool has twice as many threads as the number of available processors.
In concurrent segment search, each shard-level search request on a node is divided into multiple execution tasks called _slices_. Slices can be executed concurrently on separate threads in the index searcher thread pool, separate from the search thread pool. Each slice searches within the segments associated with it. Once all slice executions are complete, the collected results from all slices are combined (reduced) and returned to the coordinator node. The index searcher thread pool is used to execute the slices of each shard search request and is shared across all shard search requests on a node. By default, the index searcher thread pool has twice as many threads as the number of available processors.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to format it as index_searcher and search thread pools as those are the names of the threadpools.

| Cluster Configuration | % perf improvement going from cs disabled to 2 slices | % additional CPU utilization | % perf improvement going from 2 slices to 4 slices | % additional CPU utilization | % perf improvement going from 4 slices to Lucene default | % additional CPU utilization |
#### Setup comparison for `range-auto-date-histo-with-metrics`

| Clsuter configuration | % Performance improvement from CS disabled to 2 slices | % Additional CPU utilization | % Performance improvement from 2 slices to 4 slices | % Additional CPU utilization | % Performance improvement from 4 slices to Lucene default | % Additional CPU utilization |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster configuration


Third, the specific implementation of queries can greatly impact the performance when increasing concurrency as some queries may end up performing more duplicate work as the number of slices increases. For example, significant terms aggregations perform count queries on each bucket key to determine the term background frequencies so duplicated bucket keys across segment slices will result in duplicated count queries across slices as well.
Third, the specific query implementation can greatly impact the performance when increasing concurrency because some queries may perform more duplicate work as the number of slices increases. For example, significant terms aggregations count queries for each bucket key to determine the term background frequencies. Thus, duplicated bucket keys across segment slices result in duplicated count queries across slices as well.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

significant terms aggregations perform count queries

Signed-off-by: Fanit Kolchina <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jed326 @kolchfa-aws @pajuric Editorial review complete. Please see my comments and changes and let me know if you have any questions. Thanks!

_community_members/jaydeng.md Outdated Show resolved Hide resolved
_community_members/sohami.md Outdated Show resolved Hide resolved
_posts/2024-06-30-concurrent-search-follow-up.md Outdated Show resolved Hide resolved
_posts/2024-06-30-concurrent-search-follow-up.md Outdated Show resolved Hide resolved
_posts/2024-06-30-concurrent-search-follow-up.md Outdated Show resolved Hide resolved

Second, whenever the number of active threads is higher than the number of CPU cores, each individual thread may spend more time processing because the CPU cores are multiplexing tasks. By default, the `r5.2xlarge` instance with 8 CPU cores has 16 threads in the `index_searcher` thread pool and 13 threads in the search thread pool. If all 29 threads are concurrently processing search tasks, then each individual thread will encounter a longer processing time because there are only 8 CPU cores to serve these 29 threads.

Third, the specific query implementation can greatly impact the performance when increasing concurrency because some queries may perform more duplicate work as the number of slices increases. For example, significant terms aggregations run count queries for each bucket key to determine the term background frequencies. Thus, duplicated bucket keys across segment slices result in duplicated count queries across slices as well.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Third, the specific query implementation can greatly impact the performance when increasing concurrency because some queries may perform more duplicate work as the number of slices increases. For example, significant terms aggregations run count queries for each bucket key to determine the term background frequencies. Thus, duplicated bucket keys across segment slices result in duplicated count queries across slices as well.
- Third, the specific query implementation can greatly impact performance when increasing concurrency because some queries may perform more duplicate work as the number of slices increases. For example, significant terms aggregations run count queries for each bucket key to determine the term background frequencies. Thus, duplicated bucket keys across segment slices result in duplicated count queries across slices as well.


Third, the specific query implementation can greatly impact the performance when increasing concurrency because some queries may perform more duplicate work as the number of slices increases. For example, significant terms aggregations run count queries for each bucket key to determine the term background frequencies. Thus, duplicated bucket keys across segment slices result in duplicated count queries across slices as well.

Fourth, the reduce phase is performed sequentially on all segment slices. If the reduce overhead is large, it can offset the gains from searching documents concurrently. For example, for aggregations, a new `Aggregator` instance is created for each segment slice. Each `Aggregator` creates an `InternalAggregation` object, which represents the buckets created during document collection. These `InternalAggregation` object instances are then processed sequentially during the reduce phase. As a result, a simple `term` aggregation can create up to `slice_count * shard_size` buckets per shard, which are then processed sequentially during the reduce phase.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Fourth, the reduce phase is performed sequentially on all segment slices. If the reduce overhead is large, it can offset the gains from searching documents concurrently. For example, for aggregations, a new `Aggregator` instance is created for each segment slice. Each `Aggregator` creates an `InternalAggregation` object, which represents the buckets created during document collection. These `InternalAggregation` object instances are then processed sequentially during the reduce phase. As a result, a simple `term` aggregation can create up to `slice_count * shard_size` buckets per shard, which are then processed sequentially during the reduce phase.
- Fourth, the reduce phase is performed sequentially on all segment slices. If the reduce overhead is large, it can offset the gains realized from searching documents concurrently. For example, for aggregations, a new `Aggregator` instance is created for each segment slice. Each `Aggregator` creates an `InternalAggregation` object, which represents the buckets created during document collection. These `InternalAggregation` object instances are then processed sequentially during the reduce phase. As a result, a simple `term` aggregation can create up to `slice_count * shard_size` buckets per shard, which are then processed sequentially during the reduce phase.


## Wrapping up

In summary, when choosing a segment slice count to use, it’s important to run your own benchmarking to determine if the additional parallelization from adding more segment slices outweighs the additional processing overhead. While concurrent segment search is ready for use in production environments, you can continue to track its further improvements on this [project board](https://github.com/orgs/opensearch-project/projects/117).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In summary, when choosing a segment slice count to use, its important to run your own benchmarking to determine if the additional parallelization from adding more segment slices outweighs the additional processing overhead. While concurrent segment search is ready for use in production environments, you can continue to track its further improvements on this [project board](https://github.com/orgs/opensearch-project/projects/117).
In summary, when choosing a segment slice count to use, it's important to run your own benchmarking to determine whether the additional parallelization produced by adding more segment slices outweighs the additional processing overhead. Concurrent segment search is ready for use in production environments, and you can continue to track its ongoing improvements on this [project board](https://github.com/orgs/opensearch-project/projects/117).


In summary, when choosing a segment slice count to use, it’s important to run your own benchmarking to determine if the additional parallelization from adding more segment slices outweighs the additional processing overhead. While concurrent segment search is ready for use in production environments, you can continue to track its further improvements on this [project board](https://github.com/orgs/opensearch-project/projects/117).

Additionally, in order to provide performance visibility over time, we will publish nightly performance runs for concurrent segment search in [OpenSearch Performance Benchmarks] (https://opensearch.org/benchmarks), covering all the test workloads mentioned in this post.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Additionally, in order to provide performance visibility over time, we will publish nightly performance runs for concurrent segment search in [OpenSearch Performance Benchmarks] (https://opensearch.org/benchmarks), covering all the test workloads mentioned in this post.
Additionally, to provide visibility into performance over time, we will publish nightly performance runs for concurrent segment search in [OpenSearch Performance Benchmarks](https://opensearch.org/benchmarks), covering all the test workloads mentioned in this post.


Additionally, in order to provide performance visibility over time, we will publish nightly performance runs for concurrent segment search in [OpenSearch Performance Benchmarks] (https://opensearch.org/benchmarks), covering all the test workloads mentioned in this post.

For guidelines when getting started with concurrent segment search, see [General guidelines](https://opensearch.org/docs/latest/search-plugins/concurrent-segment-search/#general-guidelines).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For guidelines when getting started with concurrent segment search, see [General guidelines](https://opensearch.org/docs/latest/search-plugins/concurrent-segment-search/#general-guidelines).
For guidelines on getting started with concurrent segment search, see [General guidelines](https://opensearch.org/docs/latest/search-plugins/concurrent-segment-search/#general-guidelines).

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- search
- technical-post
meta_keywords:
meta_description:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use this updated meta:

meta_keywords: concurrent segment search, search concurrency, control search concurrency, searching segments concurrently in OpenSearch

meta_description: Learn how to benchmark, track performance, and improve latency across your large workloads by searching segments concurrently in OpenSearch.

authors:
- jaydeng
- sohami
date: 2024-06-30
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are able to get all the updates done today, we can push it live this afternoon. Please update the date to today if you feel this is possible.

Signed-off-by: Jay Deng <[email protected]>
@pajuric
Copy link

pajuric commented Jul 30, 2024

This blog is ready to push live @nateynateynate @krisfreedain.

Copy link
Member

@nateynateynate nateynateynate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go!

@nateynateynate nateynateynate merged commit 55795c5 into opensearch-project:main Jul 30, 2024
5 checks passed
@jed326
Copy link
Contributor Author

jed326 commented Jul 30, 2024

Thanks @nateynateynate! Out of curiousity, how long does it take for changes to go live on the website after merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants