-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Ruby app metrics support #172
Conversation
d417ef9
to
72ecafa
Compare
64215f5
to
6abe4b9
Compare
4cd19d4
to
06133d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool to see progress on language metrics, I love that feature! I timeboxed the review to two hours and stopping now, I will do a second pass on this after the discussions I added in this review are resolved.
In general: This doesn't seem to be Ruby specific and could be of use for other languages as well - it might be worth extracting out at a later point in time. However, since this relies on the Ruby helpers in this repo it's already tied to it to some degree and will "calcify" more over time. It might therefore make sense to extract it now (or immediately after this PR is merged) to avoid it being long-term stuck in this repo.
I updated |
Updated to address the rest of the comments. I reworked a lot. Moved the bash script that schedules the daemon into a rust script. Moved to a hardcoded URL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me; though I'd like to see the question about the generic buildpack resolved.
The classic `heroku/metrics` buildpack sets up a statsd daemon that the dyno can use to send Heroku language specific metrics. For Ruby the biggest one is Puma Pool Capacity. This code will download the metrics client and put it on the path, however there's an issue affecting our ability to actually boot a daemon. ## Context Classic buildpacks (Heroku v2) have acces to a `profile.d` directory they can use to write scripts which execute at boot time. This was largely needed to set build time environment variables because the directory structure of build and runtime was different. It was eventually used by the `heroku/metrics` buildpack to download an run a metrics daemon in the background. Cloud Native Buildpacks (CNB) removed support for bash in an effort to simplify the spec and allow buildpacks to target extremely lightweight distributions that do not include `bash` (for example if you are running a pure JVM application, then there's no runtime need to have bash on the system, removing it reduces the image size and can be viewed as a reduction in security surface area). As part of this work to remove bash from the spec the `profile.d`-like interface was removed from the CNB spec. ## Problem Without a spec supported way to execute a script at startup there is no way for a buildpack to provide a background daemon. ## Possible solutions - TBD
Errors in the middle of an `Installing ... ` section looked weird because it would start rendering on the same line: ``` # Heroku Statsd Metrics Agent - Metrics agent - Installing .! ERROR: Could not install Statsd agent ! ! An error occured while downloading and installing the metrics agent ! the buildpack cannot continue ! Debug information: Could not untar: failed to iterate over archive ERROR: failed to build: exit status 1 ERROR: failed to build: executing lifecycle: failed with status code: 51 ``` I changed it so it's now like this: ``` # Heroku Statsd Metrics Agent - Metrics agent - Downloading .. ! ERROR: Could not install Statsd agent ! ! An error occured while downloading and installing the metrics agent ! the buildpack cannot continue ! ! Debug information: Could not untar: failed to iterate over archive ERROR: failed to build: exit status 1 ERROR: failed to build: executing lifecycle: failed with status code: 51 ``` I think it would look better to indent the "debug information" at least. Maybe add an extra newline.
Since exec.d will run any program given to it I want to intentionally leak a bash process that will continue to run in the background. It works locally: ``` $ cat lol.sh #!/usr/bin/env bash echo "spawning agentmon" & while true; do echo "pretend agentmon"; sleep 2; done & ⛄️ 3.1.4 🚀 /tmp $ ./lol.sh spawning agentmon pretend agentmon ⛄️ 3.1.4 🚀 /tmp $ echo "pretend agentmon it works" it works ⛄️ 3.1.4 🚀 /tmp $ pretend agentmon echo "lpretend agentmon ol" lol ⛄️ 3.1.4 🚀 /tmp $ pretend agentmon pretend agentmon epretend agentmon echo "pretend agentmon It keeps runnipretend agentmon ng even apretend agentmon s I can pretend agentmon use the systpretend agentmon em" It keeps running even as I can use the system ⛄️ 3.1.4 🚀 /tmp $ pretend agentmon ``` However when I try to replicate this with exec.d it never seems to end the `exec.d` program and never yields control to a different process: ``` $ cargo libcnb package; docker rmi my-image; pack build my-image --buildpack /Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby/target/buildpack/debug/heroku_statsd-metrics --path /tmp/47b6249d5e0a353f91910f848a700061 --pull-policy never 🔍 Locating buildpacks... 📦 [1/2] Building heroku/ruby Determining automatic cross-compile settings... Building binaries (x86_64-unknown-linux-musl)... Finished dev [unoptimized] target(s) in 0.89s Writing buildpack directory... Successfully wrote buildpack directory: ../../target/buildpack/debug/heroku_ruby (12.64 MiB) 📦 [2/2] Building heroku/statsd-metrics Determining automatic cross-compile settings... Building binaries (x86_64-unknown-linux-musl)... Blocking waiting for file lock on package cache Blocking waiting for file lock on package cache Blocking waiting for file lock on package cache Compiling heroku-statsd-metrics v0.0.0 (/Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby/buildpacks/metrics-agent) warning: unused import: `DownloadAgentmon` --> buildpacks/metrics-agent/src/main.rs:3:40 | 3 | use crate::layers::download_agentmon::{DownloadAgentmon, DownloadAgentmonError}; | ^^^^^^^^^^^^^^^^ | = note: `#[warn(unused_imports)]` on by default warning: `heroku-statsd-metrics` (bin "heroku-statsd-metrics") generated 1 warning (run `cargo fix --bin "heroku-statsd-metrics"` to apply 1 suggestion) Finished dev [unoptimized] target(s) in 7.67s Writing buildpack directory... Successfully wrote buildpack directory: ../../target/buildpack/debug/heroku_statsd-metrics (6.58 MiB) ✨ Packaging successfully finished! 💡 To test your buildpack locally with pack, run: pack build my-image-name \ --buildpack /Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby/target/buildpack/debug/heroku_ruby \ --buildpack /Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby/target/buildpack/debug/heroku_statsd-metrics \ --path /path/to/application /Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby/target/buildpack/debug/heroku_ruby /Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby/target/buildpack/debug/heroku_statsd-metrics Error: No such image: my-image ===> ANALYZING Image with name "my-image" not found ===> DETECTING heroku/statsd-metrics 2.0.0 ===> RESTORING ===> BUILDING # Heroku Statsd Metrics Agent - Done ===> EXPORTING Adding layer 'heroku/statsd-metrics:layer_name' Adding layer 'buildpacksio/lifecycle:launch.sbom' Adding 1/1 app layer(s) Adding layer 'buildpacksio/lifecycle:launcher' Adding layer 'buildpacksio/lifecycle:config' Adding label 'io.buildpacks.lifecycle.metadata' Adding label 'io.buildpacks.build.metadata' Adding label 'io.buildpacks.project.metadata' no default process type Saving my-image... *** Images (516b6641ecda): my-image Successfully built image my-image ⛄️ 3.1.4 🚀 /Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby (schneems/metrics-agent-download-rebase) $ docker run -it --rm --entrypoint='/cnb/lifecycle/launcher' my-image 'sleep 60' spawning agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon ^[pretend agentmon ^C% ⛄️ 3.1.4 🚀 /Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby (schneems/metrics-agent-download-rebase) $ docker run -it --rm --entrypoint='/cnb/lifecycle/launcher' my-image 'bash' spawning agentmon pretend agentmon lol pretend agentmon echo lopretend agentmon l pretend agentmon ls pretend agentmon cd echpretend agentmon o "it never pretend agentmon ends" pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon ``` Here's the test directory (needs barnes in the Gemfile.lock to execute) ``` $ ls /tmp/47b6249d5e0a353f91910f848a700061 Gemfile Gemfile.lock ⛄️ 3.1.4 🚀 /tmp/47b6249d5e0a353f91910f848a700061 $ cat /tmp/47b6249d5e0a353f91910f848a700061/Gemfile source "https://rubygems.org" gem "barnes" ⛄️ 3.1.4 🚀 /tmp/47b6249d5e0a353f91910f848a700061 $ cat /tmp/47b6249d5e0a353f91910f848a700061/Gemfile.lock GEM remote: https://rubygems.org/ specs: barnes (0.0.9) multi_json (~> 1) statsd-ruby (~> 1.1) multi_json (1.15.0) statsd-ruby (1.5.0) PLATFORMS x86_64-darwin-22 DEPENDENCIES barnes BUNDLED WITH 2.4.15 ```
## Fix URL logic On the plane I assumed the URL was a redirect to the most current file. It is not. It returns a plain text body with the most recent URL. I fixed this logic. ## Cache download logic The URL is versioned. For example: ``` https://agentmon-releases.s3.amazonaws.com/agentmon-0.3.1-linux-amd64.tar.gz ``` So I used it as a cache key to not have to download the binary when the URL hasn't changed. ## Background process While my initial experiment wasn't successful David suggested to try out `start-stop-daemon` (though it is only supported on Ubuntu). The `start-stop-daemon` can successfully boot a background process and allow `execd` to return. For example this works: ```rust let background_script = layer_path.join("agentmon_script"); let execd_script = layer_path.join("agentmon_exec.d"); write_bash_script( &background_script, r#" while true; do echo 'pretend agentmon' >> /tmp/agentmon.txt sleep 2 done "#, ) .unwrap(); // Intentionally leak background process let background_script = background_script.canonicalize().unwrap(); let background_script = background_script.display(); write_bash_script( &execd_script, format!( r#"start-stop-daemon --start --background --exec "{background_script}""# ), ) .unwrap(); ``` With this output: ``` $ docker run -it --rm --entrypoint='/cnb/lifecycle/launcher' my-image 'bash' e3b22af1a126:/workspace$ e3b22af1a126:/workspace$ e3b22af1a126:/workspace$ ls /tmp agentmon.txt e3b22af1a126:/workspace$ tail -f /tmp/agentmon.txt pretend agentmon pretend agentmon pretend agentmon pretend agentmon pretend agentmon ``` As to why this is needed. From David: > [...] it appears that Go waits for all processes where PPID = launcher's PID (1), instead of the spawned PID. Still digging to understand why, since from my understanding of the code so far, that should not be the case. > launcher uses Cmd.Run() which does Cmd.Start() and then Cmd.Wait() > For Wait(), internally, Go does wait4(pid, &status, 0, &rusage): https://cs.opensource.google/go/go/+/refs/tags/go1.20.6:src/os/exec_unix.go;l=16-60 But as you can see, before that, it first calls blockUntilWaitable() which is implemented here for systems that have waitid(), such as Linux: https://cs.opensource.google/go/go/+/refs/tags/go1.20.6:src/os/wait_waitid.go The signature for that is int waitid(idtype_t idtype, id_t id, siginfo_t *infop, int options);, see https://man7.org/linux/man-pages/man2/waitpid.2.html > I am starting to wonder if this instead has to do with open FDs which the sleep() would also inherit, since launcher obviously reads from the process. I don't know why one works and the other doesn't. That needs more investigation. My immediate goal is to get agentmon working on CNBs. TODO: - Add an integration test - Move the while-loop agentmon bash script to a Rust script
``` $ cargo libcnb package && docker rmi my-image --force && pack build my-image --buildpack target/buildpack/debug/heroku_statsd-metrics --path /tmp/47b6249d5e0a353f91910f848a700061/ --pull-policy never 🔍 Locating buildpacks... 📦 [1/2] Building heroku/ruby Determining automatic cross-compile settings... Building binaries (x86_64-unknown-linux-musl)... Finished dev [unoptimized] target(s) in 0.48s Writing buildpack directory... Successfully wrote buildpack directory: ../../target/buildpack/debug/heroku_ruby (12.64 MiB) 📦 [2/2] Building heroku/statsd-metrics Determining automatic cross-compile settings... Building binaries (x86_64-unknown-linux-musl)... Blocking waiting for file lock on package cache Blocking waiting for file lock on package cache Finished dev [unoptimized] target(s) in 0.68s Finished dev [unoptimized] target(s) in 0.51s Writing buildpack directory... Successfully wrote buildpack directory: ../../target/buildpack/debug/heroku_statsd-metrics (12.15 MiB) ✨ Packaging successfully finished! 💡 To test your buildpack locally with pack, run: pack build my-image-name \ --buildpack /Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby/target/buildpack/debug/heroku_ruby \ --buildpack /Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby/target/buildpack/debug/heroku_statsd-metrics \ --path /path/to/application /Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby/target/buildpack/debug/heroku_ruby /Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby/target/buildpack/debug/heroku_statsd-metrics Untagged: my-image:latest Deleted: sha256:b73cef10f9747c93c0589f68bd30b76298f57a178c9e3efb68dbe77a9bd70cdc Deleted: sha256:9c2fb27824ec55b3e82d5245a623de6ae85ffbcfd91d9ca93cda6e220fb4ae28 Deleted: sha256:0864c2235caabe94562658b4003c64d2834a8d15915bffdb080d9d6d39d37ad1 Deleted: sha256:e0f324b7867aecebbe166b6a773859cd8a9779aed246c0e6029430b5558bfc95 Deleted: sha256:c578d8034a4c45399cf1a55207eb25d07464681fe44caf88b94b6082dee74df8 Deleted: sha256:d45865c0091f9fde9ec70ba62d1d8dd48b3cbb6bcad730be8fed56344bc36ae3 ===> ANALYZING Image with name "my-image" not found ===> DETECTING heroku/statsd-metrics 2.0.0 ===> RESTORING ===> BUILDING # Heroku Statsd Metrics Agent - Metrics agent - Downloading ..... (2.596s) - Writing scripts - Done (finished in 2.602s) ===> EXPORTING Adding layer 'heroku/statsd-metrics:statsd-metrics-agent' Adding layer 'buildpacksio/lifecycle:launch.sbom' Adding 1/1 app layer(s) Adding layer 'buildpacksio/lifecycle:launcher' Adding layer 'buildpacksio/lifecycle:config' Adding label 'io.buildpacks.lifecycle.metadata' Adding label 'io.buildpacks.build.metadata' Adding label 'io.buildpacks.project.metadata' no default process type Saving my-image... *** Images (b73cef10f974): my-image Reusing cache layer 'heroku/statsd-metrics:statsd-metrics-agent' Successfully built image my-image ⛄️ 3.1.4 🚀 /Users/rschneeman/Documents/projects/work/buildpacks/buildpacks-ruby (schneems/metrics-agent-download-rebase) $ docker run -it --rm --entrypoint='/cnb/lifecycle/launcher' -e HEROKU_METRICS_URL=https://example.com my-image 'bash' 1bb0e5cf43f5:/workspace$ ps -aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND heroku 1 2.0 0.0 4624 3792 pts/0 Ss 22:18 0:00 bash heroku 14 11.3 0.0 2176 4 ? S 22:18 0:00 /layers/heroku_statsd-metrics/statsd-metrics-agent/agentmon_loop --path /layers/heroku_statsd-metrics/statsd-metrics-agent/b heroku 16 0.0 0.0 7060 1584 pts/0 R+ 22:18 0:00 ps -aux ``` Manually running the script seems to work: ``` 1bb0e5cf43f5:/workspace$ PORT=3000 /layers/heroku_statsd-metrics/statsd-metrics-agent/agentmon_loop --path /layers/heroku_statsd-metrics/statsd-metrics-agent/bin/agentmon agentmon: Listening on :3000... ``` I'm not sure how we test this.
Asserts that an app build with the heroku statsd metrics buildpack that has `barnes` in the Gemfile.lock will have a process `agentmon_loop` running in the background on dyne boot with both HEROKU_METRICS_URL and DYNO env vars are present at boot.
Not everything on docker is a buildpack, we use prefixes to make it clear at a glance what is.
- Fix bad docstrings - Move sleep_for to a constant - Refactor logic into smaller, more consistent functions - Added tests for argument generation
Agentmon hasn't seen a release in 6 years. We can hardcode the url for now. When the URL doesn't change we can keep the last download around. I always want to update the scripts on disk so then any changes to the files will be picked up, so we'll always call update or create, and never keep. We could get fancy and use MetadataDigest to cache those too, but writing them is pretty much instantaneous so I don't think it buys us anything. This change allows us to remove the `cached` dependency.
802b064
to
64a9ffe
Compare
The `download_to_dir` did more than download, it also untar-d and set executable permissions. The `agentmon_download` was a very loose wrapper around `download_to_dir`. I made removed the chmod functionality from `download_to_dir` and renamed it `download_untar`. I moved the chmod functionality into `agentmond_download` and renamed it `agentmon_install`. It's a minor change but I think it's an improvement.
This presents a tip to the user to tell them how to enable metrics collection.
b0694c0
to
c61dd9b
Compare
I think there's a race condition in the tests previously it was failing with
Now it's failing with
I think it looks like this:
So depending on when we read the results of
Another tool to use is to look at the specified output log file (if we set |
I don't have any more blocking comments, though CI is failing and there's a pending review request for Manuel. Cancelling my "changes requested" so that Manuel can approve once he's reviewed.
The method we're using to launch a daemon process does not give us guarantees that `agentmon_loop --path` will be executed by the time the process boots. To address this we can loop and re-check for some period of time. This method utilizes the output log from agentmon loop as directed by the spawn daemon command.
c78629e
to
81d2fa4
Compare
Co-authored-by: Ed Morley <[email protected]>
This is the default. It is only needed when multiple build packs are specified
a521b5d
to
40a5b81
Compare
I fixed the tests (#172 (comment)) to be more deterministic, but it's still based on a timeout (which can fail if the machine is slow or docker is slow due to timing conditions). It's better than it was, though. Please take a look. Specifically, this line does not seem great https://github.com/heroku/buildpacks-ruby/pull/172/files#diff-17c88161984214fafac5afdc89a5919f2dbcfb1165f2560fd255809316095fe8R131. I applied all of your requested changes. Thanks for the 👀 . I've asked @Malax for another round of reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the requested changes and sorry for the delay with the re-review! Apart from the checksum check, I see nothing blocking.
I've updated to use a checksum with the |
Ruby supports "language specific metrics (beta)" https://devcenter.heroku.com/articles/language-runtime-metrics-ruby through the barnes gem https://github.com/heroku/barnes and the
heroku/metrics
buildpack https://github.com/heroku/heroku-buildpack-metrics. This PR ports the logic ofheroku/metrics
to the Ruby CNB buildpack.Implementation
At build time, this buildpack will download a statsd client (named "agentmon") from S3 and place it on the path. There are currently no explicit interfaces for running background/daemon processes via the CNB spec, however there is an interface that guarantees it will execute arbitrary code (exec.d). This interface is intended to set dynamic environment variables at runtime. As a side effect, this interface is used to schedule a background daemon via the
start-stop-daemon
command docs.The
start-stop-daemon
calls our program, a new rust script inbin/agentmon_loop
. This rust script will boot agentmon in a loop if the necessary environment variables are set.GUS-W-13115697.
GUS-W-13815707.