Filter shards for sliced search at coordinator #16771

msfroh · 2024-12-03T21:07:43Z

Description

Prior to this commit, a sliced search would fan out to every shard, then apply a MatchNoDocsQuery filter on shards that don't correspond to the current slice. This still creates a (useless) search context on each shard for every slice, though. For a long-running sliced scroll, this can quickly exhaust the number of available scroll contexts.

This change avoids fanning out to all the shards by checking at the coordinator if a shard is matched by the current slice. This should reduce the number of open scroll contexts to max(numShards, numSlices) instead of numShards * numSlices.

Related Issues

Related to #16289

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2024-12-03T21:26:29Z

❌ Gradle check result for 2d2fd05: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-12-10T02:03:48Z

❌ Gradle check result for 541979e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-12-10T19:54:12Z

❌ Gradle check result for b4aaa2f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-12-11T00:11:00Z

❌ Gradle check result for 1680b9b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-12-11T02:23:43Z

❌ Gradle check result for 8842514: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-12-11T06:08:22Z

❌ Gradle check result for eadaabd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-12-12T00:28:29Z

✅ Gradle check result for eadaabd: SUCCESS

codecov · 2024-12-12T00:30:36Z

Codecov Report

Attention: Patch coverage is 68.88889% with 14 lines in your changes missing coverage. Please review.

Project coverage is 72.22%. Comparing base (7050ecf) to head (6a2de32).

Files with missing lines	Patch %	Lines
...min/cluster/shards/ClusterSearchShardsRequest.java	41.66%	3 Missing and 4 partials ⚠️
...n/admin/cluster/RestClusterSearchShardsAction.java	0.00%	5 Missing ⚠️
...pensearch/action/search/TransportSearchAction.java	75.00%	1 Missing ⚠️
...java/org/opensearch/search/slice/SliceBuilder.java	88.88%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #16771      +/-   ##
============================================
+ Coverage     72.19%   72.22%   +0.02%     
- Complexity    65208    65217       +9     
============================================
  Files          5297     5297              
  Lines        303324   303351      +27     
  Branches      43913    43926      +13     
============================================
+ Hits         218999   219098      +99     
+ Misses        66367    66261     -106     
- Partials      17958    17992      +34

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

reta · 2024-12-12T22:08:57Z

server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java

+                // Filter the returned shards for the given slice
+                CollectionUtil.timSort(indexIterators);
+                for (int i = 0; i < indexIterators.size(); i++) {
+                    if (slice.shardMatches(i, indexIterators.size())) {


I am wondering if we should be using shard id instead of i? Do all shards always participate in search? (or some of them could be filtered out earlier)?

if (slice.shardMatches(indexIterators.get(i).shardId().id(), indexIterators.size())) {

So, the challenge here is that it's not necessarily the shard ID.

If you've specified routing preferences or something like _shards:2,4,8, then the shard iterators returned from computeTargetedShards will only contain the shards corresponding to the routing preference or the shard filter. The modulo logic for slice matching pretends those are the "full universe" of shards for slicing purposeses.

Essentially, I'm mimicking the shard-level behavior that happens here:

OpenSearch/server/src/main/java/org/opensearch/search/slice/SliceBuilder.java

Lines 245 to 253 in 0d54c16

// remap the original shard id with its index (position) in the sorted shard iterator.

for (ShardIterator it : group) {

assert it.shardId().getIndex().equals(request.shardId().getIndex());

if (request.shardId().equals(it.shardId())) {

shardId = ord;

break;

}

++ord;

}

(though that code does the terrible thing of using the ordinal positions but pretending that they're shardIds).

IMO, the approach I used (where it's always ordinals, which may happen to be the same as shard IDs) is slightly less hacky.

I should add a comment to explain why it needs to be ordinals and not shard IDs, though.

Oh I see, it makes sense but SliceBuilder should be using something like shardIndex than? Cuz now it uses shardId explicitly:

public boolean shardMatches(int shardId, int numShards) {

Yeah, that's fair. I made it shardId before I properly understood the existing SliceBuilder logic (which overwrites the real shardId with ord).

While I'm there I'm going to rename the variable in the method quoted above -- using a variable called shardId when you really mean shardIndex is ugly.

github-actions · 2024-12-17T20:07:04Z

❌ Gradle check result for c7abf62: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-12-19T02:34:54Z

❌ Gradle check result for f29e111: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Prior to this commit, a sliced search would fan out to every shard, then apply a MatchNoDocsQuery filter on shards that don't correspond to the current slice. This still creates a (useless) search context on each shard for every slice, though. For a long-running sliced scroll, this can quickly exhaust the number of available scroll contexts. This change avoids fanning out to all the shards by checking at the coordinator if a shard is matched by the current slice. This should reduce the number of open scroll contexts to max(numShards, numSlices) instead of numShards * numSlices. Signed-off-by: Michael Froh <[email protected]>

Signed-off-by: Michael Froh <[email protected]>

github-actions · 2024-12-19T23:48:16Z

❌ Gradle check result for 6a2de32: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-12-20T21:01:30Z

❌ Gradle check result for f29e111: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-12-21T18:00:53Z

✅ Gradle check result for 6a2de32: SUCCESS

getsaurabh02 assigned msfroh Dec 9, 2024

opensearch-ci-bot mentioned this pull request Nov 11, 2024

[AUTOCUT] Gradle Check Flaky Test Report for UpdateByQueryBasicTests #16439

Open

msfroh force-pushed the filter_shards_by_slice branch from 2d2fd05 to 541979e Compare December 10, 2024 01:05

opensearch-ci-bot mentioned this pull request Oct 21, 2024

[AUTOCUT] Gradle Check Flaky Test Report for DeleteByQueryBasicTests #15820

Open

msfroh force-pushed the filter_shards_by_slice branch from 541979e to b4aaa2f Compare December 10, 2024 18:56

msfroh force-pushed the filter_shards_by_slice branch from 1680b9b to 8842514 Compare December 11, 2024 00:36

opensearch-ci-bot mentioned this pull request Dec 11, 2024

[AUTOCUT] Gradle Check Flaky Test Report for SpecificClusterManagerNodesIT #15944

Open

msfroh added backport 2.x Backport to 2.x branch enhancement Enhancement or improvement to existing feature or request Search:Resiliency labels Dec 11, 2024

msfroh force-pushed the filter_shards_by_slice branch from 8842514 to eadaabd Compare December 11, 2024 05:07

opensearch-ci-bot mentioned this pull request Dec 11, 2024

[AUTOCUT] Gradle Check Flaky Test Report for MinimumClusterManagerNodesIT #14289

Open

msfroh marked this pull request as ready for review December 12, 2024 19:47

msfroh requested review from anasalkouz, andrross, ashking94, Bukhtawar, CEHENKLE, dblock, dbwiddis, gbbafna and jed326 as code owners December 12, 2024 19:47

msfroh requested review from kotwanikunal, mch2, nknize, owaiskazi19, reta, Rishikesh1159, sachinpkale, saratvemulapalli, shwetathareja, sohami, VachaShah, jainankitk and linuxpi as code owners December 12, 2024 19:47

reta reviewed Dec 12, 2024

View reviewed changes

opensearch-ci-bot mentioned this pull request Dec 18, 2024

[AUTOCUT] Gradle Check Flaky Test Report for DedicatedClusterSnapshotRestoreIT #15806

Open

opensearch-ci-bot mentioned this pull request Dec 19, 2024

[AUTOCUT] Gradle Check Flaky Test Report for VerifyVersionConstantsIT #14585

Open

msfroh added 4 commits December 19, 2024 14:49

Add changelog and REST test

a429b31

Signed-off-by: Michael Froh <[email protected]>

Add comment explaining slice <-> shard assignment

222c1ea

Signed-off-by: Michael Froh <[email protected]>

Rename shardId to shardOrdinal for clarity

6a2de32

Signed-off-by: Michael Froh <[email protected]>

msfroh force-pushed the filter_shards_by_slice branch from f29e111 to 6a2de32 Compare December 19, 2024 22:50

opensearch-ci-bot mentioned this pull request Dec 19, 2024

[AUTOCUT] Gradle Check Flaky Test Report for RemoteStorePinnedTimestampsGarbageCollectionIT #16088

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter shards for sliced search at coordinator #16771

Filter shards for sliced search at coordinator #16771

msfroh commented Dec 3, 2024

github-actions bot commented Dec 3, 2024

github-actions bot commented Dec 10, 2024

github-actions bot commented Dec 10, 2024

github-actions bot commented Dec 11, 2024

github-actions bot commented Dec 11, 2024

github-actions bot commented Dec 11, 2024

github-actions bot commented Dec 12, 2024

codecov bot commented Dec 12, 2024 •

edited

Loading

reta Dec 12, 2024 •

edited

Loading

msfroh Dec 13, 2024 •

edited

Loading

reta Dec 17, 2024 •

edited

Loading

msfroh Dec 17, 2024

github-actions bot commented Dec 17, 2024

github-actions bot commented Dec 19, 2024

github-actions bot commented Dec 19, 2024

github-actions bot commented Dec 20, 2024

github-actions bot commented Dec 21, 2024

	// remap the original shard id with its index (position) in the sorted shard iterator.
	for (ShardIterator it : group) {
	assert it.shardId().getIndex().equals(request.shardId().getIndex());
	if (request.shardId().equals(it.shardId())) {
	shardId = ord;
	break;
	}
	++ord;
	}

Filter shards for sliced search at coordinator #16771

Are you sure you want to change the base?

Filter shards for sliced search at coordinator #16771

Conversation

msfroh commented Dec 3, 2024

Description

Related Issues

Check List

github-actions bot commented Dec 3, 2024

github-actions bot commented Dec 10, 2024

github-actions bot commented Dec 10, 2024

github-actions bot commented Dec 11, 2024

github-actions bot commented Dec 11, 2024

github-actions bot commented Dec 11, 2024

github-actions bot commented Dec 12, 2024

codecov bot commented Dec 12, 2024 • edited Loading

Codecov Report

reta Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

msfroh Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

reta Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

msfroh Dec 17, 2024

Choose a reason for hiding this comment

github-actions bot commented Dec 17, 2024

github-actions bot commented Dec 19, 2024

github-actions bot commented Dec 19, 2024

github-actions bot commented Dec 20, 2024

github-actions bot commented Dec 21, 2024

codecov bot commented Dec 12, 2024 •

edited

Loading

reta Dec 12, 2024 •

edited

Loading

msfroh Dec 13, 2024 •

edited

Loading

reta Dec 17, 2024 •

edited

Loading