Skip to content

datafeed: check remote_cluster_client before cluster aliases in start #129601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

schase-es
Copy link
Contributor

TransportStartDatafeedAction previously tried to validate remote index cluster names in datafeed jobs, before checking if the local cluster had remote_cluster_client role. Because this role enables retrieval of the remote cluster names, the validation step would always fail with a no-such-cluster exception. This was confusing. This change moves the remote_cluster_client check ahead of cluster name validation, and adds a test.

Closes ES-11841

TransportStartDatafeedAction previously tried to validate remote index cluster
names in datafeed jobs, before checking if the local cluster had
remote_cluster_client role. Because this role enables retrieval of the remote
cluster names, the validation step would always fail with a no-such-cluster
exception. This was confusing. This change moves the remote_cluster_client check
ahead of cluster name validation, and adds a test.

Closes ES-11841
@schase-es schase-es added >non-issue :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. backport auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 labels Jun 18, 2025
@schase-es
Copy link
Contributor Author

Including David Kyle as the last person who edited these, as an FYI... this area seems to have limited distributed coordination involvement in the past.

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits, but this LGTM. Please wait until we've got approval from someone in ML before merging though, because I'm not 100% familiar with this code.

(specific question to someone more familiar with the code would be - is this the best place to do the check? Node roles are immutable, so we could do the check when the job is created, but the master could change to one that does have the role between creation and starting, and then everything would be fine? so perhaps start is the right place?)

* remote_cluster_client role is missing in the local cluster. This prevents remote indices from being
* resolved to their cluster names.
*
* Written to address github issue 121149.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Perhaps
@see <a href="https://github.com/elastic/elasticsearch/issues/121149">GitHub issue 121149</a>
or similar? I think it'll look nicer in javadoc/mouse-overs

""");
try {
localClient.performRequest(startRequest);
assertTrue("This request should fail", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More succinct to use

ResponseException e = assertThrows(ResponseException.class, () -> localClient.performRequest(startRequest);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I was looking around for this, but couldn't find it -- I see it's a junit method now

.*Datafeed \\[test_datafeed\\] is configured with a remote index pattern\\(s\\) \
\\[remote_cluster:remote_index\\] but the current node \\[local_cluster-0\\] \
is not allowed to connect to remote clusters. Please enable node.remote_cluster_client \
for all machine learning nodes and master-eligible nodes.*""", messages[1]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if org.elasticsearch.common.regex.Regex#simpleMatch(java.lang.String, java.lang.String, boolean) might be more forgiving/readable? it just uses wildcard characters rather than regex syntax.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually assertThat(X, containsString(Y)) is good enough

assertThat(e.getMessage(), containsString("Datafeed [test_datafeed] is configured with a remote index pattern [remote_cluster:remote_index] but the current node [local_cluster-0] is not allowed to connect to remote clusters. Please enable node.remote_cluster_client for all machine learning nodes and master-eligible nodes."));

clusterService.getNodeName()
)
)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
);
);
return;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!!

.*Datafeed \\[test_datafeed\\] is configured with a remote index pattern\\(s\\) \
\\[remote_cluster:remote_index\\] but the current node \\[local_cluster-0\\] \
is not allowed to connect to remote clusters. Please enable node.remote_cluster_client \
for all machine learning nodes and master-eligible nodes.*""", messages[1]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually assertThat(X, containsString(Y)) is good enough

assertThat(e.getMessage(), containsString("Datafeed [test_datafeed] is configured with a remote index pattern [remote_cluster:remote_index] but the current node [local_cluster-0] is not allowed to connect to remote clusters. Please enable node.remote_cluster_client for all machine learning nodes and master-eligible nodes."));

Comment on lines 178 to 194
protected String getTestRestCluster() {
return localCluster.getHttpAddresses();
}

@Override
protected boolean preserveIndicesUponCompletion() {
return true;
}

@Override
protected boolean preserveClusterUponCompletion() {
return true;
}

@Override
protected boolean preserveDataStreamsUponCompletion() {
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want these settings? If another test is added to this class these settings may interfere with the test results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure actually -- the test errorred-out without them. This was adapted from the MultiClustersIT test... I'll see if I can learn more about them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see -- that missing return you flagged in TransportStartDatafeedAction was the culprit -- whatever state that put the thread pools caused an error in clean-up.

* License v3.0 only", or the "Server Side Public License, v 1".
*/

package org.elasticsearch.datastreams;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should really be in the ml qa suite https://github.com/elastic/elasticsearch/tree/main/x-pack/plugin/ml/qa

The existing multi cluster tests are yaml tests and difficult to adapt. Please create a new folder datafeed-multicluster-tests in ml/qa and move this test to that folder. You will need a new build.gradle in the same folder to run the test, I'm not certain what should be in the gradle file but I'll help you figure it out. As a starting point try this

apply plugin: 'elasticsearch.internal-java-rest-test'

dependencies {
  testImplementation project(path: ':test:test-clusters')
  javaRestTestImplementation project(path: xpackModule('core'))  
}

Assuming you use the suggested folder names run the test with this command:

./gradlew :x-pack:plugin:ml:qa:datafeed-multicluster-tests:javaRestTest 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test BTW, thanks for adding it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these notes David -- this is really helpful. I'm very new here and don't know lots about the testing environment.

schase-es and others added 4 commits June 18, 2025 17:34
- changed doc url to link github issue
- added return after continuation send -- oops!
- use junit methods for testing message/exception matching
- moved the test to the xpack ml/qa folder
- added a gradle build file that appears to work
- removed excess override settings from RestTest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >non-issue v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants