Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noble Numbat stemcell no longer uses monit #320

Open
Tracked by #892
cunnie opened this issue Feb 20, 2024 · 6 comments
Open
Tracked by #892

Noble Numbat stemcell no longer uses monit #320

cunnie opened this issue Feb 20, 2024 · 6 comments
Labels

Comments

@cunnie
Copy link
Member

cunnie commented Feb 20, 2024

  • Windows stemcell has precedence for not using monit
  • There's a systemd shim that we might be able to use for releases that use bpm
  • early reports are that systemd is noticeably faster than monit
  • It'll be a breaking change
  • We may create a "convenience" bash script for operators so that "monit summary" and "monit restart ..." work (TBD)
  • We may force release authors who don't use bpm to cut new releases that use it
  • bpm already accommodates releases that aren't meant to be run in a container (still runs in a namespace, but places almost no restrictions on that namespace), e.g. kubelet (kubo-release).
  • release authors need a way to support jammy and noble at the same time.
  • monit is super old, difficult to compile, and there may be CVEs we're not aware of because it's not installed via package.
  • we can get rid of the cgroup / iptables stuff we created to harden monit
  • See here for an in-depth analysis of why we should replace monit and how to best accomplish that.
@ameowlia
Copy link
Member

ameowlia commented Mar 12, 2024

👍 Another vote here to get rid of monit... hopefully in place of something better. Below I want to describe issues the App Runtime Platform WG has had with monit.

the monit version is old

It is so old that it does not support HTTP health checks.

we have written our own healthcheckers

  • We wrote a generic healthchecker release that we vendor to run side by side with some of our components
  • We configure it for gorouter, bosh-dns-adapter, and silk-daemon.
  • We write a restart script for how to handle the output from the healthchecker. For example, for gorouter we will restart the gorouter the first 10 times, but after that we leave it up because it is bad for HA to keep restarting gorouter.

✨ It would be great we didn't have to maintain our own tools to do this healthchecking!

@maxmoehl
Copy link
Member

Was it discussed to move to a different service manager like systemd? Reading the document it sounds like the proposed way forward is with bpm only.

We have use-cases which require processes running directly in the host namespaces. In our setup with haproxy-boshrelease we have side-cars and end up mounting / sharing most of the things which are isolated by bpm to make them accessible again, defeating the purpose of isolation. There are other features which bpm currently prevents us from using, such as CPU pinning (there is only a single cgroup with all CPUs and no option to change that at the moment).

Personally, I'm struggling to see the need for isolating those processes in containers since we spawn dedicated VMs for deployments / jobs (this is not to say that there are no valid use-cases to do so).


In a bit more detail:

release authors need a way to support jammy and noble at the same time.

systemd would be available on both.

We may force release authors who don't use bpm to cut new releases that use it

We have use-cases which require processes to run directly on the host which is not possible with BPM as I've understood?

bpm already accommodates releases that aren't meant to be run in a container (still runs in a namespace, but places almost no restrictions on that namespace), e.g. kubelet (kubo-release).

I've taken a look at the docs and don't think that this covers our use-case (but haven't tested it due to time constraints, sorry).

It would be great we didn't have to maintain our own tools to do this healthchecking!

HTTP health-checks is not something offered by systemd, the "systemd-way" would be sd_notify which we could implement against if we decide to go with systemd.

To be clear: I'm not saying systemd is the only way, but I'm afraid bpm in its current form is not able to address all use-cases adequately and if we are making a breaking change anyway we might want to look for well established tools instead of creating our own.

@jpalermo
Copy link
Member

Yeah, I'm not in favor of creating a process monitor. That's a "solved" problem and considering how painful most of the solutions are, it's not an easy problem to solve.

One thing that BPM does bring is a generic yaml structure for describing how to run a bosh job. I think that would be a good starting point for a monit replacement. Since a lot of bosh jobs already have a bpm.yml file it would be easy for them to migrate.

The agent could parse that file and then create a systemd (or other) config to run the job. There could still be a flag for bpm: true to enable containerization and use bpm to isolate the job.

What we don't want to do is "monit 2.0" where we have release authors create systemd service files that we then take and give to systemd directly. We should have a translation layer so we have flexibility to change the process runner/monitor at a later date.

@bgandon
Copy link
Contributor

bgandon commented May 9, 2024

I've taken a look at the docs and don't think that this covers our use-case (but haven't tested it due to time constraints, sorry).

BPM is evolutive for any use-case that is not covered yet. We've done so in the past, and I've personally participated a couple of those evolutions: one for the JVM native code generator that required execution ability (we've added the allow_executions option to volumes), and a second for the Postgres daemon which requires an INT signal (instead of TERM) for a clean exit (that's when we've added the shutdown_signal option).

So @maxmoehl, I really encourage you to test the unsafe.privileged: true option in your haproxy-boshrelease, because it basically disables any namespacing, so that you benefit from the BPM process description in YAML, without having any insolation in the way.

In the event we would isolate an unsupported use-case, then I'm pretty sure we can get BPM to evolve and support it!

@bgandon
Copy link
Contributor

bgandon commented Oct 8, 2024

BPM accommodates much workloads, but some that use cgroups like ContainerD will need extra shims when run on stemcells that use SystemD.

I'll give a talk tomorrow at CF day EU, on a pattern I've come up with for running ContainerD with BPM.

Not all stemcells run SystemD, as Bosh Lite stemcells are an exception to this. Would we build something where SystemD is a strong requirement, we may break Bosh Lite and that would be sad.

@jpalermo
Copy link
Member

I know @rkoster recently had some success running SystemD as part of a docker-cpi stemcell. That same might be applicable to running on the warden-cpi too.

But if we have an abstract layer between the bosh release and the process manager, we could always have the bosh-agent have multiple process manager implementations and build one that still works in bosh-lite. It might lag behind, but it should be a possibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Waiting for Changes | Open for Contribution
Development

No branches or pull requests

6 participants