Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exit Teku if any service fails to start #8980

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zilm13
Copy link
Contributor

@zilm13 zilm13 commented Jan 9, 2025

PR Description

Current behavior: Exit if any of these services cannot bind port
New behavior: Exit if any of this services fails to start:

  • Attestation Manager
  • P2P Manager (gossip)
  • Block Manager
  • Sync Service
  • Merge Block Monitor

So my arguments are: Teku is not really alive if any of these services fails to start. If we keep it running, we keep zombie, which confuses users and complicates automation. We'd better quit in this case

PS: it will require some test changes, will update after discussion consensus

Fixed Issue(s)

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

  • I thought about adding a changelog entry, and added one if I deemed necessary.

@lucassaldanha
Copy link
Member

startServices() is called as part of the chain:

SafeFuture<?> doStart() -> SafeFuture<?> initialize() -> void startServices()

I am just wondering if a hard exit at that point could prevent other services etc of a clean shutdown. Not sure though, need to dig a bit.

@Nashatyrev
Copy link
Contributor

We started this discussion with @zilm13 at the point where the tech.pegasys.teku.config.TekuConfigurationTest failed after this change.
It failed because services were still starting while shutdown was initiated.
The shutdown was initiated that early because doStart() future has been completed but startServices() was not yet executed/completed.
This led me to the idea that doStart() future should depend on startServices() async task. Which from my perspective makes sense cause Service.start() future completion implies that all necessary 'sub-services' are up and running at this point.

As for calling System.exit() inside a service startup procedure it rather looks like a workaround which is a candidate for addressing now.

I would suggest for startServices() to return a future, bind it to doStart() future and invoke System.exit() on some very upper level (e.g. in Teku or BeaconNode) when the Service.start() future completes with some specific (or not) exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants