Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Management Proxy Node: Coordinator randomly goes down #606

Open
DiscordJim opened this issue Jul 30, 2024 · 2 comments
Open

No Management Proxy Node: Coordinator randomly goes down #606

DiscordJim opened this issue Jul 30, 2024 · 2 comments
Labels

Comments

@DiscordJim
Copy link

Affected Stackable version

24.3

Affected Apache Druid version

28.0.1

Current and expected behavior

After roughly 3-4 days, the router will display "No Management Proxy Node." It seems, from testing, that the error is that the router cannot connect to the coordinator. However, all services display healthy logs and there are no clear errors, nor error codes from the panel.

The difficulty to debug comes from the fact that there are no errors.

Possible solution

The only way we have to recover from this state is to restart all services.

Additional context

  • Extensions: '["druid-kafka-indexing-service", "druid-datasketches", "prometheus-emitter", "druid-basic-security", "druid-opa-authorizer", "postgresql-metadata-storage", "druid-hdfs-storage", "druid-stats"]'
  • Deep Storage: HDFS
  • Metadata Store: Postgres

Environment

AKS

Would you like to work on fixing this bug?

None

@DiscordJim
Copy link
Author

The fix is to have multiple replicas for your coordinator node, or if you are using an overlord node replicas there instead.

@lfrancke
Copy link
Member

I'd like to reopen this issue if that's okay for you as we should either document this or have the operator validate and warn about this scenario.

@lfrancke lfrancke reopened this Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants