Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to start a job after n other jobs have started #435

Open
mterron opened this issue Jul 10, 2017 · 8 comments
Open

Allow to start a job after n other jobs have started #435

mterron opened this issue Jul 10, 2017 · 8 comments

Comments

@mterron
Copy link
Contributor

mterron commented Jul 10, 2017

Sometimes, we need to define job dependencies that are non-linear. Given jobs A,B & C, job C might depend on A & B being healthy, however A doesn't depend on B or B on A.

At the moment, the only way I could find to express this dependency graph, was to create an artificial dependency between A & B and then make C depend on B. This slows down startup.

I suggest that something like this could be implemented:

jobs: [
  {
    name: "A",
    exec: "A.sh",
  },
  {
    name: "B",
    exec: "B.sh",
  },
  {
    name: "C",
    exec: "C.sh",
    when: {
      source: ["A","B"],
      once: "healthy"
    }
  }
`
]
@tgross
Copy link
Contributor

tgross commented Jul 11, 2017

The big picture need for this seems sound. The details look complicated. I think we need to explore the edge cases, particularly around each vs once and some of the non-health-related events. I also want to make sure that adding the flexibility doesn't make it much more difficult for an end-user to understand what's going on. Here's 3 general cases that I have concerns about, but I'd love if we can explore any further cases:


Case 1: multiple sources, once healthy

when: {
  source: ["A", "B"],
  once: "healthy"
}

This was your original example. Note that there's an implicit AND here. We're saying execute one time, after both A and B are healthy. One corner case is what might we expect to happen if A becomes healthy, then A becomes unhealthy, and then B becomes healthy? We respond to events, not state, so that implies that each job will have to track not just its own state but the state of its triggering events as well.

It looks like the case of exitSuccess, exitFailed, and changed all have the same set of state behaviors.


Case 2: multiple sources, each healthy

when: {
  source: ["A", "B"],
  each: "healthy"
}

This case takes the previous case and complicates it. The language of "each" kind of implies that we're now OR'ing the health states rather than AND'ing them, but it explicitly means that we run the job on each healthy event.

Like case 1, it looks like the case of exitSuccess, exitFailed, and changed all have the same set of state behaviors.


Case 3: multiple sources, once stopping

when: {
  source: ["A", "B"],
  once: "stopping"
}

We have state tracking again as per case 1. In this case we're responding to an event, but that event signals that we've entered an implicit "stopping state" that exists until we receive the stopped event. So even if we track state as per case 1 and 2 above, what would be the expected behavior if A fires stopping, A fires stopped, and then B fires stopping?

@jwreagor
Copy link
Contributor

Curious how the state tracking will take place. Isn't the event bus already holding this state and you just need this type of job's event to observe subsequent events in order to fire?

I'd have to dig but I'm unsure if the bus was designed in that way. My hope would be that you could remove the hard dependency tracking out of some sort of global state manager and into already existing behavior.

@tgross
Copy link
Contributor

tgross commented Jul 19, 2017

The bus is a dumb publisher. Each job tracks its own state (via things like restartsRemain or startEvent), which is why we did things like set the start event to NonEvent in #438.

@jwreagor
Copy link
Contributor

Of course, right where it was yesterday. I consistently over think the utility of that bus.

@mterron
Copy link
Contributor Author

mterron commented Jul 25, 2017

I see this is more complicated than I thought. Is there any other initiative to add state tracking to CP? I'm happy to keep using my "solution" if that's the way it is. I just thought it was a valid use case.

As an MVP, would it be simpler if there was only support for once: Healthy or once: exitSuccess as in "after" this n things are healthy/started, launch and then is up to the app to react to events and other dependencies going down.

@tgross
Copy link
Contributor

tgross commented Jul 26, 2017

Is there any other initiative to add state tracking to CP? I'm happy to keep using my "solution" if that's the way it is. I just thought it was a valid use case.

It does seem valid, for sure. But yeah it's just complicated. We don't have any other initiative doing state tracking other than the state of the job itself.

As an MVP, would it be simpler if there was only support for once: Healthy or once: exitSuccess as in "after" this n things are healthy/started, launch and then is up to the app to react to events and other dependencies going down.

That might be plausible. I do worry such a restriction on having multiple each or multiple stopping event handlers might seem arbitrary to users, but we have other places where we've had to say "we just don't support that because supporting it will be even more confusing".

@tgross
Copy link
Contributor

tgross commented Aug 3, 2017

Noting for myself that there's a lot of under-the-hook implementation overlap between the issues in #435, #416, and #396

@tgross tgross mentioned this issue Aug 3, 2017
@gbmeuk
Copy link

gbmeuk commented Jan 15, 2018

Hi,

We have a case that is related to this issue and also #416 and #518, where we hit a race condition between an on-change job and a pre-start job. Given the following containerpilot jobs:

    {
      name: 'pre-start',
      exec: '/usr/local/bin/app-manage preStart',
      when: {
        source: 'watch.squid-gcp-proxy',
        once: 'healthy'
      }
    }
    {
      name: 'on-change-squid-gcp-proxy',
      exec: '/usr/local/bin/app-manage reload',
      when: {
        source: 'watch.squid-gcp-proxy',
        each: 'changed'
      }
    }
    {
      name: 'apache-fwdproxy',
      exec: '/usr/local/apache/bin/apachectl -Xf /etc/apache-fwdproxy/httpd.conf -k start -D APACHE-FWDPROXY',
      restarts: 3,
      port: '33000',
      health: {
        exec: '/usr/local/bin/app-manage health',
        interval: 10,
        ttl: 30,
        timeout: 3,
      },
      tags: [
        'apache',
        'googleproxy'
      ],
      consul: {
        enableTagOverride: true,
        deregisterCriticalServiceAfter: '10m'
      },
      when: {
        source: 'pre-start',
        once: 'exitSuccess'
      }
    }

...And the script functions as follows:

preStart() {
    _log "Configuring application"
    touch /usr/local/apache/htdocs/health
    configureApp
}


health() {
    msg=$(curl --fail -sS http://localhost:33000/health)
    status=$?
    if [ ! ${status} -eq 0 ]; then
        echo ${msg}
        exit ${status}
    else
        return ${status}
    fi
}

reload() {
    _log "Configuring application"
    configureApp
    _log "reloading application"
    /usr/local/apache/bin/apachectl \
          -f /etc/apache-fwdproxy/httpd.conf \
          -k graceful \
          -D APACHE-FWDPROXY
}

Sometimes apache is started with graceful instead of start and then fails to run or reconfigure in a consistent and reliable fashion.

This issue was resolved by changing reload() to:

reload() {
    health
    if [ $? -eq 0 ]; then
        _log "Configuring application"
        configureApp
        _log "reloading application"
        /usr/local/apache/bin/apachectl \
            -f /etc/apache-fwdproxy/httpd.conf \
            -k graceful \
            -D APACHE-FWDPROXY
    else
        _log "WARNING: application not running. Can't reload"
    fi

}

I totally understand the design decision to emit a changed and healthy event, so it would be really nice to be able to handle this by better functionality in when. At least clearer documentation around the flow of even messages - in particular how changed and healthy are both emitted together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants