Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple HealthChecks that resolve to a single health check status #117

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

elandau
Copy link
Contributor

@elandau elandau commented Oct 10, 2014

The current Karyon and Eureka health check implementation attempts to consolidate multiple health states into a single status in a way that uses unwieldy dependencies between various conditions for health check. This PR provides a generic solution through which multiple HealthCheck components may be registered and resolved into a single strategy using a plugable strategy.

Several new concepts are introduced here,

  1. HealthCheck that when invoked returns a Status object
  2. The Status object tracks a single isReady boolean as well as an error. The combination of these two variables yields 3 states.
    (isReady=false, error=null) Component is initializing
    (isReady=false, error=nonnull) Component failed to initialize
    (isReady=true, error=null) Component has been initialized and is working properly
  3. HealthCheckRegistry through which multiple HealthCheck conditions may be registered. The default implementation takes multibound Set as well as the existing karyon HealthCheckHandler as well as a HealthCheck based on the manual status tracked by ApplicationInfoManager.
  4. HealthCheckInvoker strategy for invoking the actual health checks. A basic inline implementation is provided. A more robust implementation will include parallelizing the health checks and adding timeouts.
  5. EureakHealthCheckResolver strategy for resolving the health checks into a single Eureka status.

Note that this implementation depends on a (yet to be issued) update to eureka where ApplicationInfoManager tracks an InstanceStatus separately instead of setting that status directly on the singleton InstanceInfo. This solves various race conditions whereby the InstanceInfo status can be updated independently outside of the context of healthcheck.

An example use case for this mechanism is to track an application's healthcheck using 3 conditions.

  1. Karyon server initialization which is set after Guice returns
  2. HealthCheckHandler
  3. ApplicationInfoManager.setInstanceStatus being manually set.

All 3 conditions must be set to a healthy status for an InstanceInfo.UP status to be set on the eureka client

@cloudbees-pull-request-builder

karyon-pull-requests #101 FAILURE
Looks like there's a problem with this pull request

* 1. Not ready status = false, error = null
* 2. Ready status = true, error = null
* 3. Error state status = false, error ! =null
*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are moving away from single boolean, why not introduce a more complete service status model with { Starting, PartiallyReady, Ready, Down, Error } states?

There is different semantic associated with each:

  • Starting - some time in the future I will be ready
  • PartiallyReady - I am providing limited service, possibly due to problem with communicating with backend services
  • Read - complete set of services provided
  • Down - state after service/library shutdown call was made
  • Error - any other error condition (with associated cause)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this solution I'm trying to minimize the different states being tracked and instead opt to have multiple conditions. The above states can be modeled using the following conditions

  1. Starting
  2. Healthcheck
  3. Terminating

Each of these conditions would have a boolean ok/bad as well as an optional error. So a failure to start would have 'starting' set to false with error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants