diff --git a/docs/development/designs.md b/docs/development/designs.md deleted file mode 100644 index 4f12627..0000000 --- a/docs/development/designs.md +++ /dev/null @@ -1,115 +0,0 @@ -# Design Thinking - -Our "MetricSet" is mirroring the design of a JobSet, which can combine multiple different things (i.e., metrics) into a cohesive unit. -It is assumed that the metrics you put into a listing of metrics are part of the same set. E.g., -With this design, we assume that you are primarily interested in measuring an application performance, collecting storage metrics, or -"rolling your own" design with a custom metric (e.g., a networking metric that has a special setup with a launcher and other customizations to the JobSet) - -## Overview - -Given the above assumption, the logic flow of the operator works as follows: - - - The user writes a metrics.yaml file that optionally includes an application OR storage description or neither for a custom metric. Typically, you'd provide an application for performance metrics, and storage for IO/filesystem metrics, and neither for a custom metric. - - Each metric in the list is also associated with a type (internal to the operator) that is checked. This means if you define an `Application` - - The operator will create a JobSet that runs one or more metrics per MetricSet type: - - Application metrics create a JobSet with each metric as a sidecar container sharing the process namespace to monitor (they can be given volumes if needed) - - Storage metrics deploy the metrics as containers and give them access to the volume - - Standalone metrics can do any custom design needed, and do not require application or storage (but can be provided storage volumes) - -The current design allows only one JobSet per metrics.yaml, but this can be relaxed to allow up to three JobSets per metrics.yaml (one for each of the types specified above). -We will write this into more detail in the usage docs. - -## Kubernetes Abstractions - -We use a JobSet on the top level with Replica set to 1, and within that set, for each metric type we create one or more ReplcatedJob that can hold one or more containers. The containers and design depend on the metric of interest, for which we currently support application (performance), storage, and standalone metrics (discussed below). - -### Metrics - -For our initial design, we allowed metrics of different types to be combined (e.g., running an application performance metric -alongside a storage one within the same JobSet) but for our second design we decided to enforce separation of concerns. -More specifically, if you are benchmarking storage, you are unlikely to also be benchmarking an application, and vice -versa. The design of the operator was updates to reflect this preference. Thus, the three groups of metrics we believe -are most strongly assessed together are: - -- **performance**: measuring an application performance through time via a shared process namespace -- **storage**: measuring storage read/write or general IO for one or more mounted volumes -- **standalone** a more complex metric that might require custom JobSet logic, and is intended to be run in isolation. - -### Performance - -For a performance metric, the general pattern we use is to create a separate container for each metric (these are pre-built and provided alongside the operator) and then add the application container to the set. This means that the set of metrics containers and application containers serve as sidecars in the same pod. Within this design, there are two sub-designs that a metric can use: - -1. Interact with the application via a shared process namespace (supports greater than one metric) -2. Allow the metric to share a volume (and some modular, portable filesystem asset) with the application (recommended for one metric only) - -Here is what the case 1 looks like. Note the shared process namespace between the two containers. - -![img/application-metric-set.png](img/application-metric-set.png) - -Here is the second design. Note that we still have the shared application process namespace, but we also allow the metric to add a shared volume. We do this by way of adding an empty volume, -and then allowing the metric to customize the application entrypoint to do some custom logic (e.g., copy an entire tree to the shared volume): - -![img/application-metric-volume.png](img/application-metric-volume.png) - -For both of the above, the metrics pods have `SYS_PTRACE` added and a flag is set to share the process -namespace, so we can read and write to the application container from a metrics pod. We should -be able to see things in the opposite direction, but without permissions. I've tested this -setup with more than one metric container, and it seems to work. You can read more about some of this [early testing here](https://vsoch.github.io/2023/shared-process-namespace/) and think this is a good idea, at least to start. This means, generally for a "perf" metric design, we deploy -it alongside an application of interest, wait to see the PID of the running process, and then -monitor it at some frequency (rate) for some number of times (completions) or until the application is done running, whichever is first. Current metric output is in the pod logs, and hopefully we can improve upon this. In addition to performance, it would be nice to have a simple means to measure the timing of the application. - -### Storage - -Setting up storage, typically by way of a persistent volume claim that turns into a persistent volume, is complex. This means that we require that the user (likely you) creates the PVC on your own, and then you can provide information about it to the operator. The operator will then request a volume, measure something on it for some rate and length of time, and then clean up. -That looks like this: - -![img/storage-metric-set.png](img/storage-metric-set.png) - - -### Standalone - -A standalone metric does not require an application container or a storage specification, but rather uses a "standalone" setting that indicates it runs on its own. This is also enforced in design - since a standalone metric has finer control of the underlying JobSet, as a metric -it must be run on its own. As an example, for a networking tool that uses MPI to run across nodes, we can set the number of pods (via the indexed job) to a number greater than 1, and then we will be making an indexed job with that many pods to run the command. That might look like this: - -![img/standalone-metric-set.png](img/standalone-metric-set.png) - -We don't technically need a shared process space, a storage setup, or an application. -And actually, that headless service that provides the network is available for storage -or applications as well - we just don't use them in the previous example! The ability -to scale (via a number of pods > 1) is also a feature of storage and services if your -tool requires that. - -## Output Options - -### Logging Parser - -For the simplest start, I've decided to allow for metrics to have their own custom output (indeed it would be hard to standardize this between so many different tools) but have the operator -provide structure to that, meaning separators to distinguish sections, and a consistent way to output metadata. As an example, here is what the top level metadata and sections (with some custom output data between) -would look like: - -```console -METADATA START {"pods":1,"completions":1,"storageVolumePath":"/workflow","storageVolumeHostPath":"/tmp/workflow","metricName":"io-sysstat","metricDescription":"statistics for Linux tasks (processes) : I/O, CPU, memory, etc.","metricType":"storage","metricOptions":{"completions":2,"human":"false","rate":10}} -METADATA END -METRICS OPERATOR COLLECTION START -METRICS OPERATOR TIMEPOINT -...custom data output here for timepoint 1... -METRICS OPERATOR TIMEPOINT -...custom data output here for timepoint 2... -METRICS OPERATOR TIMEPOINT -...custom data output here for timepoint N... -METRICS OPERATOR COLLECTION END -``` - -In the above, we can parse the metadata for the run from the first line (a subset of flattened, important features dumped in json) and then clearly mark the start and end of collection, -along with separation between timepoints. This is the most structure we can provide, as each metric output looks different. It's up to the Python module parser from the "metricsoperator" -module to know how to parse (and possibly plot) any specific output type. - -### Database for Metric Storage - -I was considering (and still am, ) to try creating a consistent database that can be used to store metrics across runs. In the space of an operator, this means we can't clean it up when the specific metric is deleted, but rather it should be owned by the namespace. I'm not sure how to do that but will think about ideas. Worst case, we have the user deploy the database in the same namespace -separately. Best case, we can manage it for them, or (better) not require it at all. -I don't want anything complicated (I don't want to re-create prometheus or a monitoring service!) - -## Design Links - - - Original diagrams (August 2023) are available on [Excalidraw](https://excalidraw.com/#json=kvaus7c1bSLvw64tz_jHa,Lx5vjCos2QNaCO6iUFT_SQ) diff --git a/docs/development/designs/current.md b/docs/development/designs/current.md index 88bdec0..3abb8a2 100644 --- a/docs/development/designs/current.md +++ b/docs/development/designs/current.md @@ -1,5 +1,9 @@ # Current Design +For this second design,m we can more easily say: + +> A Metric Set is a collection of metrics to measure IO, performance, or networking that can be customized with addons. + The original design was a good first shot, but was flawed in several ways: 1. I could not combined metrics into one. E.g., if I wanted to use a launcher jobset design combined with HPCToolkit, another metric, I could not. @@ -7,92 +11,26 @@ The original design was a good first shot, but was flawed in several ways: 3. The use of Storage, Application, and Volume was messy at best (external entities to add to a metric set) For this second design, the "MetricSet" is still mirroring the design of a JobSet, but it is more generic, and of one type. There are no longer different -flavors of metric sets. Rather, we allow metrics to generate replicated jobs. The user can choose to run more than one metric, and this will generate another -replicated job for the jobset, at the decision of the user. For the "extras" that we need to integrate - e.g., applications, volumes/storage, or -even extra containers that add logic, these are now called metric addons. - -> A metric addon is a customization to a metric set to add functionality. - -With this design, we still assume that you are primarily interested in measuring an application performance, or collecting storage metrics. -If you imagine the Metrics Operator as putting together legos, the primary difference is that unlike the previous design, we have smaller pieces to work with, namely -volumes, application (or other) containers, and any other addons that might be defined for a replicated job. - - -TODO WRITE ME. - -## Overview - -Given the above assumption, the logic flow of the operator works as follows: +flavors of metric sets. Rather, we allow metrics to generate replicated jobs. For the "extras" that we need to integrate to supplement those jobs - e.g., applications, volumes/storage, or +even extra containers that add logic, these are now called metric addons. More specifically, an addon can: - - The user writes a metrics.yaml file that optionally includes an application OR storage description or neither for a custom metric. Typically, you'd provide an application for performance metrics, and storage for IO/filesystem metrics, and neither for a custom metric. - - Each metric in the list is also associated with a type (internal to the operator) that is checked. This means if you define an `Application` - - The operator will create a JobSet that runs one or more metrics per MetricSet type: - - Application metrics create a JobSet with each metric as a sidecar container sharing the process namespace to monitor (they can be given volumes if needed) - - Storage metrics deploy the metrics as containers and give them access to the volume - - Standalone metrics can do any custom design needed, and do not require application or storage (but can be provided storage volumes) + - Add extra containers (and config maps for their entrypoints) + - Add custom logic to entrypoints for specific jobs and/or containers + - Add additional volumes that range the gamut from empty to persistent disk. -The current design allows only one JobSet per metrics.yaml, but this can be relaxed to allow up to three JobSets per metrics.yaml (one for each of the types specified above). -We will write this into more detail in the usage docs. +The current design allows only one JobSet per metrics.yaml, and this was an explicit choice after realizing that it's unlikely to want more than one. ## Kubernetes Abstractions -We use a JobSet on the top level with Replica set to 1, and within that set, for each metric type we create one or more ReplcatedJob that can hold one or more containers. The containers and design depend on the metric of interest, for which we currently support application (performance), storage, and standalone metrics (discussed below). - -### Metrics - -For our initial design, we allowed metrics of different types to be combined (e.g., running an application performance metric -alongside a storage one within the same JobSet) but for our second design we decided to enforce separation of concerns. -More specifically, if you are benchmarking storage, you are unlikely to also be benchmarking an application, and vice -versa. The design of the operator was updates to reflect this preference. Thus, the three groups of metrics we believe -are most strongly assessed together are: - -- **performance**: measuring an application performance through time via a shared process namespace -- **storage**: measuring storage read/write or general IO for one or more mounted volumes -- **standalone** a more complex metric that might require custom JobSet logic, and is intended to be run in isolation. - -### Performance - -For a performance metric, the general pattern we use is to create a separate container for each metric (these are pre-built and provided alongside the operator) and then add the application container to the set. This means that the set of metrics containers and application containers serve as sidecars in the same pod. Within this design, there are two sub-designs that a metric can use: - -1. Interact with the application via a shared process namespace (supports greater than one metric) -2. Allow the metric to share a volume (and some modular, portable filesystem asset) with the application (recommended for one metric only) - -Here is what the case 1 looks like. Note the shared process namespace between the two containers. +We use a JobSet on the top level with Replica set to 1, and within that set, each metric is allowed to create one or more ReplcatedJob. We can easily customize the style of the replicated job based +on interfacs. E.g.,: -![img/application-metric-set.png](img/application-metric-set.png) +- The `LauncherWorker` is a typical design that might have a launcher and MPI hostlist written, and a main command run there to then interact with the workers. +- The `SingleApplication` is a basic design that expects one or more pods in an indexed job, and also shares the process namespace. +- The `StorageGeneric` is almost the same, but doesn't share a process namespace. -Here is the second design. Note that we still have the shared application process namespace, but we also allow the metric to add a shared volume. We do this by way of adding an empty volume, -and then allowing the metric to customize the application entrypoint to do some custom logic (e.g., copy an entire tree to the shared volume): - -![img/application-metric-volume.png](img/application-metric-volume.png) - -For both of the above, the metrics pods have `SYS_PTRACE` added and a flag is set to share the process -namespace, so we can read and write to the application container from a metrics pod. We should -be able to see things in the opposite direction, but without permissions. I've tested this -setup with more than one metric container, and it seems to work. You can read more about some of this [early testing here](https://vsoch.github.io/2023/shared-process-namespace/) and think this is a good idea, at least to start. This means, generally for a "perf" metric design, we deploy -it alongside an application of interest, wait to see the PID of the running process, and then -monitor it at some frequency (rate) for some number of times (completions) or until the application is done running, whichever is first. Current metric output is in the pod logs, and hopefully we can improve upon this. In addition to performance, it would be nice to have a simple means to measure the timing of the application. - -### Storage - -Setting up storage, typically by way of a persistent volume claim that turns into a persistent volume, is complex. This means that we require that the user (likely you) creates the PVC on your own, and then you can provide information about it to the operator. The operator will then request a volume, measure something on it for some rate and length of time, and then clean up. -That looks like this: - -![img/storage-metric-set.png](img/storage-metric-set.png) - - -### Standalone - -A standalone metric does not require an application container or a storage specification, but rather uses a "standalone" setting that indicates it runs on its own. This is also enforced in design - since a standalone metric has finer control of the underlying JobSet, as a metric -it must be run on its own. As an example, for a networking tool that uses MPI to run across nodes, we can set the number of pods (via the indexed job) to a number greater than 1, and then we will be making an indexed job with that many pods to run the command. That might look like this: - -![img/standalone-metric-set.png](img/standalone-metric-set.png) - -We don't technically need a shared process space, a storage setup, or an application. -And actually, that headless service that provides the network is available for storage -or applications as well - we just don't use them in the previous example! The ability -to scale (via a number of pods > 1) is also a feature of storage and services if your -tool requires that. +I haven't found a need for another kind of design yet (most are the launcher worker type) but can easily add them if needed. +There is no longer any distinction between MetricSet types, as there is only one MetricSet that serves as a shell from the metric. ## Output Options @@ -117,14 +55,4 @@ METRICS OPERATOR COLLECTION END In the above, we can parse the metadata for the run from the first line (a subset of flattened, important features dumped in json) and then clearly mark the start and end of collection, along with separation between timepoints. This is the most structure we can provide, as each metric output looks different. It's up to the Python module parser from the "metricsoperator" -module to know how to parse (and possibly plot) any specific output type. - -### Database for Metric Storage - -I was considering (and still am, ) to try creating a consistent database that can be used to store metrics across runs. In the space of an operator, this means we can't clean it up when the specific metric is deleted, but rather it should be owned by the namespace. I'm not sure how to do that but will think about ideas. Worst case, we have the user deploy the database in the same namespace -separately. Best case, we can manage it for them, or (better) not require it at all. -I don't want anything complicated (I don't want to re-create prometheus or a monitoring service!) - -## Design Links - - - Original diagrams (August 2023) are available on [Excalidraw](https://excalidraw.com/#json=kvaus7c1bSLvw64tz_jHa,Lx5vjCos2QNaCO6iUFT_SQ) +module to know how to parse (and possibly plot) any specific output type. \ No newline at end of file diff --git a/docs/getting_started/addons.md b/docs/getting_started/addons.md index 7f25833..01cfee2 100644 --- a/docs/getting_started/addons.md +++ b/docs/getting_started/addons.md @@ -211,3 +211,51 @@ storage: All of the above are strings. The pipe allows for multiple lines, if appropriate. Note that while a "volume" is typical, you might have a storage setup that is done via a set of custom commands, in which case you don't need to define the volume too. + +## Performance + +### perf-hpctoolkit + + - *[perf-hpctoolkit](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/perf-lammps-hpctoolkit)* + +This metric provides [HPCToolkit](https://gitlab.com/hpctoolkit/hpctoolkit) for your application to use. This is the first metric of its type +to use a shared volume approach. Specifically, we: + +- add a new ability for an application metric to define an empty volume, and have the metrics container copy stuff to it +- also add an ability for this kind of application metric to customize the application entrypoint (e.g., copy volume contents to destinations) +- build a spack copy view into the [hpctoolkit metrics container](https://github.com/converged-computing/metrics-containers/blob/main/hpctoolkit-containerize/Dockerfile) +- move the `/opt/software` and `/opt/views/view` roots into the application container, this is a modular install of HPCToolkit. +- copy over `/opt/share/software` (provided via the shared empty volume) to `/opt/software`` where spack expects it. We also add `/opt/share/view/bin` to the path (where hpcrun is) + +After those steps are done, HPCToolkit is essentially installed, on the fly, in the application container. Since the `hpcrun` command is using `LD_AUDIT` we need +all libraries to be in the same system (the shared process namespace would not work). We can then run it, and generate a database. Here is an example +given `hpctoolkit-lmp-measurements` in the present working directory of the container. + + +```bash +hpcstruct hpctoolkit-lmp-measurements + +# Run "the professor!" 🤓️ +hpcprof hpctoolkit-lmp-measurements +``` + +The above generates a database, `hpctoolkit-lmp-database` that you can copy to your machine for further interaction with hpcviewer +(or some future tool that doesn't use Java)! + +```bash +kubectl cp -c app metricset-sample-m-0-npbc9:/opt/lammps/examples/reaxff/HNS/hpctoolkit-lmp-database hpctoolkit-lmp-database +hpcviewer ./hpctoolkit-lmp-database +``` + +Here are the acceptable parameters. + +| Name | Description | Type | Default | +|-----|-------------|------------|------| +| mount | Path to mount hpctoolview view in application container | string | /opt/share | +| events | Events for hpctoolkit | string | `-e IO` | + +Note that you can see events available with `hpcrun -L`, and use the container for this metric. +There is a brief listing on [this page](https://hpc.llnl.gov/software/development-environment-software/hpc-toolkit). +We recommend that you do not pair hpctoolkit with another metric, primarily because it is customizing the application +entrypoint. If you add a process-namespace based metric, you likely need to account for the hpcrun command being the +wrapper to the actual executable. diff --git a/docs/getting_started/metrics.md b/docs/getting_started/metrics.md index bfd0d24..a60aa81 100644 --- a/docs/getting_started/metrics.md +++ b/docs/getting_started/metrics.md @@ -3,12 +3,8 @@ The following metrics are under development (or being planned). - [Examples](https://converged-computing.github.io/metrics-operator/getting_started/metrics.html#examples) - - [Storage Metrics](https://converged-computing.github.io/metrics-operator/getting_started/metrics.html#storage) - - [Application Metrics](https://converged-computing.github.io/metrics-operator/getting_started/metrics.html#application) - - [Standalone Metrics](https://converged-computing.github.io/metrics-operator/getting_started/metrics.html#standalone) -Each of the above is a metric design, which is primarily represented in the Metrics Operator code. However, within each design -there are different families of metrics (e.g., storage, network, performance, simulation) shown in the table below as the "Family" column. +Each metric can be ascribed to a high level family, shown in the table below as the "Family" column. We likely will tweak and improve upon these categories. @@ -16,63 +12,8 @@ We likely will tweak and improve upon these categories. ## Implemented Metrics -Each metric has a link to the type, along with (optionally) examples. These sections will better be organized by -family once we decide on a more final set. +### perf-sysstat -### Performance - -These metrics are intended to assess application performance, where they run alongside an application of interest. - -#### perf-hpctoolkit - - - [Application Metric Set](user-guide.md#application-metric-set) - - *[perf-hpctoolkit](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/perf-hpctoolkit)* - -This metric provides [HPCToolkit](https://gitlab.com/hpctoolkit/hpctoolkit) for your application to use. This is the first metric of its type -to use a shared volume approach. Specifically, we: - -- add a new ability for an application metric to define an empty volume, and have the metrics container copy stuff to it -- also add an ability for this kind of application metric to customize the application entrypoint (e.g., copy volume contents to destinations) -- build a spack copy view into the [hpctoolkit metrics container](https://github.com/converged-computing/metrics-containers/blob/main/hpctoolkit-containerize/Dockerfile) -- move the `/opt/software` and `/opt/views/view` roots into the application container, this is a modular install of HPCToolkit. -- copy over `/opt/share/software` (provided via the shared empty volume) to `/opt/software`` where spack expects it. We also add `/opt/share/view/bin` to the path (where hpcrun is) - -After those steps are done, HPCToolkit is essentially installed, on the fly, in the application container. Since the `hpcrun` command is using `LD_AUDIT` we need -all libraries to be in the same system (the shared process namespace would not work). We can then run it, and generate a database. Here is an example -given `hpctoolkit-lmp-measurements` in the present working directory of the container. - - -```bash -hpcstruct hpctoolkit-lmp-measurements - -# Run "the professor!" 🤓️ -hpcprof hpctoolkit-lmp-measurements -``` - -The above generates a database, `hpctoolkit-lmp-database` that you can copy to your machine for further interaction with hpcviewer -(or some future tool that doesn't use Java)! - -```bash -kubectl cp -c app metricset-sample-m-0-npbc9:/opt/lammps/examples/reaxff/HNS/hpctoolkit-lmp-database hpctoolkit-lmp-database -hpcviewer ./hpctoolkit-lmp-database -``` - -Here are the acceptable parameters. - -| Name | Description | Type | Default | -|-----|-------------|------------|------| -| mount | Path to mount hpctoolview view in application container | string | /opt/share | -| events | Events for hpctoolkit | string | `-e IO` | - -Note that you can see events available with `hpcrun -L`, and use the container for this metric. -There is a brief listing on [this page](https://hpc.llnl.gov/software/development-environment-software/hpc-toolkit). -We recommend that you do not pair hpctoolkit with another metric, primarily because it is customizing the application -entrypoint. If you add a process-namespace based metric, you likely need to account for the hpcrun command being the -wrapper to the actual executable. - -#### perf-sysstat - - - [Application Metric Set](user-guide.md#application-metric-set) - *[perf-hello-world](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/perf-hello-world)* - *[perf-lammps](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/perf-lammps)* @@ -114,13 +55,9 @@ after the index at 0 gets a custom command. See [pidstat](https://man7.org/linux more information on this command, and [this file](https://github.com/converged-computing/metrics-operator/blob/main/pkg/metrics/perf/sysstat.go) for how we use them. If there is an option or command that is not exposed that you would like, please [open an issue](https://github.com/converged-computing/metrics-operator/issues). -### Storage -These metrics are intended to assess storage volumes. +### io-fio -#### io-fio - - - [Storage Metric Set](user-guide.md#application-metric-set) - *[io-host-volume](https://github.com/converged-computing/metrics-operator/tree/main/examples/storage/google/io-fusion)* This is a nice tool that you can simply point at a path, and it measures IO stats by way of writing a file there! @@ -140,9 +77,8 @@ Options you can set include: For the "directory" we use this location to write a temporary file, which will be cleaned up. This allows for testing storage mounted from multiple metric pods without worrying about a name conflict. -#### io-ior +### io-ior - - [Storage Metric Set](user-guide.md#application-metric-set) - *[io-host-volume](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/io-ior)* ![img/ior.jpeg](img/ior.jpeg) @@ -160,9 +96,8 @@ basic commands are done. Note that the container does have mpirun if you want to for this across nodes, but this could be added. [Let us know](https://github.com/converged-computing/metrics-operator/issues) if this would be interesting to you. -#### io-sysstat +### io-sysstat - - [Storage Metric Set](user-guide.md#application-metric-set) - *[io-host-volume](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/io-host-volume)* This is the "iostat" executable of the sysstat library. @@ -177,14 +112,8 @@ This is the "iostat" executable of the sysstat library. This is good for mounted storage that can be seen by the operating system, but may not work for something like NFS. -### Standalone - -Standalone metrics can take on many designs, from a launcher/worker design to test networking, to running -a metric across nodes to assess the node performance. - -#### network-netmark +### network-netmark - - [Standalone Metric Set](user-guide.md#application-metric-set) - *[network-netmark](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/network-netmark)* (code still private) This is currently a private container/software, but we have support for it when it's ready to be made public (networking) @@ -200,9 +129,8 @@ Variables to customize include: | storeEachTrial | Flag to indicate storing each trial data | options->storeEachTrial | string (true/false) | "true" | | soleTenancy | Turn off sole tenancy (one pod/node) | options->soleTenancy | string ("false" or "no") | "true" | -#### network-osu-benchmark +### network-osu-benchmark - - [Standalone Metric Set](user-guide.md#application-metric-set) - *[network-osu-benchmark](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/network-osu-benchmark)* Point to point benchmarks for MPI (networking). If listOptions->commands not set, will use all one-point commands. @@ -296,9 +224,8 @@ Here are some useful resources for the benchmarks: - [HPC Council](https://hpcadvisorycouncil.atlassian.net/wiki/spaces/HPCWORKS/pages/1284538459/OSU+Benchmark+Tuning+for+2nd+Gen+AMD+EPYC+using+HDR+InfiniBand+over+HPC-X+MPI) - [AWS Tutorials](https://www.hpcworkshops.com/08-efa/04-complie-run-osu.html) -#### app-lammps +### app-lammps - - [Standalone Metric Set](user-guide.md#application-metric-set) - *[app-lammps](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/app-lammps)* Since we were using LAMMPS so often as a benchmark (and testing timing of a network) it made sense to add it here @@ -321,9 +248,8 @@ In the working directory `/opt/lammps/examples/reaxff/HNS#`. You should be calli You should also provide the correct number of processes (np) and problem size for LAMMPS (lmp). We left this as open and flexible anticipating that you as a user would want total control. -#### app-amg +### app-amg - - [Standalone Metric Set](user-guide.md#application-metric-set) - *[app-amg](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/app-amg)* AMG means "algebraic multi-grid" and it's easy to confuse with the company [AMD](https://www.amd.com/en/solutions/supercomputing-and-hpc) "Advanced Micro Devices" ! From [the guide](https://asc.llnl.gov/sites/asc/files/2020-09/AMG_Summary_v1_7.pdf): @@ -376,9 +302,8 @@ More likely you want an actual problem size on a specific number of node and tas run a larger problem and the parser does not work as expected, please [send us the output](https://github.com/converged-computing/metrics-operator/issues) and we will provide an updated parser. See [this guide](https://asc.llnl.gov/sites/asc/files/2020-09/AMG_Summary_v1_7.pdf) for more detail. -#### app-quicksilver +### app-quicksilver - - [Standalone Metric Set](user-guide.md#application-metric-set) - *[app-quicksilver](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/app-quicksilver)* Quicksilver is a proxy app for Monte Carlo simulation code. You can learn more about it on the [GitHub repository](https://github.com/LLNL/Quicksilver/). @@ -436,9 +361,8 @@ qs /opt/quicksilver/Examples/CORAL2_Benchmark/Problem1/Coral2_P1.inp You can also look more closely in the [GitHub repository](https://github.com/LLNL/Quicksilver/tree/master/Examples). -#### app-pennant +### app-pennant - - [Standalone Metric Set](user-guide.md#application-metric-set) - *[app-pennant](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/app-pennant)* Pennant is an unstructured mesh hydrodynamics for advanced architectures. The documentation is sparse, but you @@ -538,9 +462,8 @@ There are many input files that come in the container, and here are the fullpath And likely you will need to adjust the mpirun parameters, etc. -#### app-kripke +### app-kripke - - [Standalone Metric Set](user-guide.md#application-metric-set) - *[app-kripke](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/app-kripke)* [Kripke](https://github.com/LLNL/Kripke) is (from the README): @@ -584,9 +507,8 @@ ex3_colored-indexset_solution ex6_stencil-offset-layout_solution ex9_matrix-tr (meaning on the PATH in `/opt/Kripke/build/bin` in the container). For apps / metrics to be added, please see [this issue](https://github.com/converged-computing/metrics-operator/issues/30). -#### app-ldms +### app-ldms - - [Standalone Metric Set](user-guide.md#application-metric-set) - *[app-ldms](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/app-ldms)* @@ -608,9 +530,8 @@ The following is the default command: ldms_ls -h localhost -x sock -p 10444 -l -v ``` -#### app-nekbone +### app-nekbone - - [Standalone Metric Set](user-guide.md#application-metric-set) - *[app-nekbone](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/app-nekbone)* Nekbone comes with a set of example that primarily depend on you choosing the correct workikng directory and command to run from. @@ -634,9 +555,8 @@ And the following combinations are supported. Note that example1 did not build, You can see the archived repository [here](https://github.com/Nek5000/Nekbone). If there are interesting metrics in this project it would be worth bringing it back to life I think. -#### app-laghos +### app-laghos - - [Standalone Metric Set](user-guide.md#application-metric-set) - *[app-laghos](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/app-laghos)* From the [Laghos README](https://github.com/CEED/Laghos): @@ -651,9 +571,8 @@ the path, so the default references it as `./laghos`. | command | The full mpirun and laghos command | options->command |string | (see below) | | workdir | The working directory for the command | options->workdir | string | /workdir/laghos | -#### app-bdas +### app-bdas - - [Standalone Metric Set](user-guide.md#application-metric-set) - *[app-bdas](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/app-bdas)* BDAS standards for "Big Data Analysis Suite" and you can read more about it [here](https://asc.llnl.gov/sites/asc/files/2020-09/BDAS_Summary_b4bcf27_0.pdf). diff --git a/docs/getting_started/user-guide.md b/docs/getting_started/user-guide.md index ad93a82..0577399 100644 --- a/docs/getting_started/user-guide.md +++ b/docs/getting_started/user-guide.md @@ -7,26 +7,29 @@ with the Metrics Operator installed and are interested to submit your own [custo ### Overview -Our "MetricSet" is mirroring the design of a [JobSet](https://github.com/kubernetes-sigs/jobset/), which can combine multiple different things (i.e., metrics) into a cohesive unit. +Our "MetricSet" is mirroring the design of a [JobSet](https://github.com/kubernetes-sigs/jobset/), which can simply be defined as follows: + +> A Metric Set is a collection of metrics to measure IO, performance, or networking that can be customized with addons. + When you create a MetricSet using this operator, we assume that you are primarily interested in measuring an application performance, collecting storage metrics, or using a custom metric provided by the operator that has less stringent requirements. -
+Each metric provided by the operator (ranging from network to applications to io) has a prebuilt container, and knows how to launch one or more replicated jobs +to measure or assess the performance of something. A MetricSet itself is just a single shell for some metric, which can be further customized with addons. +A MetricAddon "addon" is flexible to be any kind of "extra" that is needed to supplement a metric run - e.g., applications, volumes/storage, or +even extra containers that add logic. High level, this includes: -Logic Flow of Metrics Operator + - Add extra containers (and config maps for their entrypoints) + - Add custom logic to entrypoints for specific jobs and/or containers + - Add additional volumes that range the gamut from empty to persistent disk. -Given the above assumption, the logic flow of the operator works as follows: +And specific examples might include: -1. You write a metrics.yaml file that optionally includes an application OR storage description or neither for a custom metric. Typically, you'd provide an application for performance metrics, and storage for IO/filesystem metrics, and neither for a custom metric. In simpler terms, we have three types of MetricSet - Application Metric Sets, Storage Metric Sets, and Standalone Metric Sets. -2. You also include a list of metrics. Each metric you choose is associated with a type (internal to the operator) that can match to an Application, Storage, or Standalone Metric set. Don't worry, this is checked for you, and you can use our **TBA** metrics registry to find metrics of interest. -3. The operator will create a JobSet for your metrics set. The structure of this JobSet depends on the type (see more below under [metrics](#metrics)). Generally: - - Application metrics create a JobSet with each metric as a sidecar container sharing the process namespace to monitor (they can be given volumes if needed) - - Storage metrics deploy the metrics as containers and give them access to the volume - - Standalone metrics can do any custom design needed, and do not require application or storage (but can be provided storage volumes) + - Every kind of volume is provided as a volume addon, this way you can run a storage metric against some kind of mounted storage. + - A container (application) addon makes it easy to add your custom container to run alongside a metric that shares (and monitors) the process namespace + - A monitoring tool provided via a modular install for a container can be provided as an addon, and it works by creating container, and sharing assets via an empty volume shared with some metric container(s) of interest. The sharing and setup of the volume happens via customizing the main metric entrypoint(s) and also adding a custom config map volume (for the addon container entrypoint). -
- -Generally, you'll be defining an application container with one or more metrics to assess performance, or a storage solution with the same, but metrics to assess IO. There are several modes of operation, depending on your choice of metrics. +Within this space, we can easily customize the patterns of metrics by way of shared interfaces. Common patterns for shared interfaces currently include a `LauncherWorker`, `SingleApplication`, and `StorageGeneric` design. ### Install @@ -93,8 +96,8 @@ TEST SUITE: None Let's first review how this works. -1. We provide metrics here to assess performance, storage, networking, and other custom cases (called standalone). -2. You can choose one or more metrics to run alongside your application or storage (volumes) and measure. +1. We provide metrics here to assess performance, storage, networking, ot other custom cases (e.g run an HPC application). +2. You can choose to supplement a metric with addons (e.g., add a volume to an IO metric) 3. The metric output is printed in pod logs with a standard packaging (e.g., sections and headers) to distinguish output sections. 4. We provide a Python module [metricsoperator](https://pypi.org/project/metricsoperator/) that can help you run an experiment, applying the metrics.yaml and then retrieving and parsing logs. @@ -119,167 +122,8 @@ For all metric types, the following applies: 1. You can create more than one pod (scale the metric) as you see fit. 2. There is always a headless service provided for metrics within the JobSet to make use of. -3. The definition of metrics in your metrics.yaml file is consistent across types. -4. Each metric type in the list can take a rate, completions, and custom options. - -For another overview of these designs, please see the [developer docs](../development/index.md). - -### Application Metric Set - -> An application metric set includes one or more metrics for measuring application performance. We take two strategies: - - - Share the process namespace, giving access of the metric container to the process space of the application - - Share a volume on the filesystem, allowing content from the metrics container to be used in the application container - -Let's walk through an example. In the image below, you want to run one or more custom metrics to measure performance for your application of choice. - -![img/application-metric-set-diagram.png](img/application-metric-set-diagram.png) - -You'll do this by writing a metrics.yaml file (left panel) that defines the application of interest, which in this case in LAMMPS. -This will be handed to the metrics operator (middle panel) that will validate your MetricSet and prepare to deploy, and -the result is a JobSet (right panel) that includes a Job with one or more containers alongside your application. -Let's look at this process in more detail. Here is what the metrics.yaml file might look like. -Note that the image above defines two metrics, but the YAML file below only shows a list of one. - -```yaml -apiVersion: flux-framework.org/v1alpha1 -kind: MetricSet -metadata: - labels: - app.kubernetes.io/name: metricset - app.kubernetes.io/instance: metricset-sample - name: metricset-sample -spec: - application: - image: ghcr.io/rse-ops/vanilla-lammps:tag-latest - command: mpirun lmp -v x 1 -v y 1 -v z 1 -in in.reaxc.hns -nocite - metrics: - - name: perf-sysstat -``` - -It was a design choice that using an application container in this context requires no changes to the container itself. -You simply need to know what the entrypoint command is, and this will allow the metric sidecar containers to monitor it. -In our case, for our command we are issuing `mpirun`, and that's what we want to monitor. Thus, the `image` and `command` attributes are the -only two required for a basic application setup. For the next section, "metrics" we've found an application metric (so it can be used for an -Application Metric Set) that we like called `perf-sysstat`, and we add it to the list. We could easily have added more, because one -application run can be monitored by several tools, but we will keep it simple for this example. Next, let's submit this to the metrics operator. - -```bash -$ kubectl apply -f metrics.yaml -``` - -When the operator receives the custom resource, it will do some obvious validation, like did you provide application metrics for an application? -Did you provide all the required fields? Did you only provide a definition for one metric type? Any errors will result in not creating the MetricSet, -and an error in the operator logs. Given that you've provided a valid custom resource YAML and one or more application metrics -that the operator knows, it will then select your metrics from the set it has internally defined (middle panel). This panel shows that -the operator knows about several application (green), storage (yellow), and standalone (red) metrics, and it's going -to combine them into a JobSet that includes your application container to allow each metric to assess performance. - -![img/application-metric-set.png](img/application-metric-set.png) - -Focusing on the third panel, the way this works is that we create a JobSet with a single replicated job with multiple containers. -One container is for your application, and the others are sidecar containers that are running the metrics. Again, because of this design -you don't need to customize or tweak your application container! By way of the shared process namespace and knowing the command you've -executed, the sidecar containers can easily "peek in" to your application container to monitor the process and save metrics. -For this application metric set design, all containers should complete to determine success of the JobSet, and we currently -rely on pod logs to get output, however we hope to have a more solid solution for this in the future. - -### Storage Metric - -> A storage metric set includes one or more metrics for assessing one or more volumes - -If you are interested in measuring the goodness of different kinds of volumes, you might be interested in creating a storage metric set! The design is similar -to an application metrics set, however instead of an application of interest, you provide one or more storage volumes of interest. Here is a small -example that assumes a host volume: - -```yaml -apiVersion: flux-framework.org/v1alpha1 -kind: MetricSet -metadata: - labels: - app.kubernetes.io/name: metricset - app.kubernetes.io/instance: metricset-sample - name: metricset-sample -spec: - storage: - volume: - # This is the path on the host (e.g., inside kind container) - hostPath: /tmp/workflow - - # This is the path in the container - path: /workflow - - metrics: - - name: io-sysstat - rate: 10 - completions: 2 -``` - -In the above, we want to use the storage metric called "io-sysstat" to assess a host volume at `/tmp/workflow` that is mounted to `/workflow` in the container. Since a volume -could last forever (hypothetically) we ask for two completions 10 seconds apart each. This means we will get data for two timepoints from the metric, and after that, -the assessment will be complete. We can also look at this visually: - -![img/storage-metric-set-diagram.png](img/storage-metric-set-diagram.png) - -In the above, we are providing storage metrics (the image has two despite the yaml above showing one) that the operator knows about, along with a storage volume that we want to test. -The operator will prepare a JobSet with one replicated job and several containers, where one container is created per storage metric, and the volume bound to each. - -![img/storage-metric-set.png](img/storage-metric-set.png) - -In simple terms, a storage metric set will use the volume of interest that you request, and run the tool there. -Read/write is important here - e.g., if the metric needs to write to the volume, a read only volume won't work. Setting up storage -is complex, so it's typically up for you to create the PVC and then the operator will create the volume for it. Keep in mind that you should -honor RWX (read write many) vs just RW (read write) depending on the design you choose. Also note that by default, we only create one pod, -but if appropriate you can scale up to more. - -### Standalone Metric - -> A custom, standalone metric that doesn't abide by any rules! - -The standalone metric is the most interesting of the set, as it doesn't have a strict requirement for a storage or application definition. -We currently have a few flavors of standalone metrics that include: - - - applications that are timed (e.g., LAMMPS) - - networking tools (e.g., OSU benchmarks and netmark) - -By definition, it is "standalone" because it's going to create a custom JobSet setup for a metric of interest. Because we cannot be -certain of how to combine different jobs within this JobSet, we currently only allow one standalone metric to be defined at once. -This means that in the diagram below, you see online one standalone metric in the metrics.yaml - -![img/standalone-metric-set-diagram.png](img/standalone-metric-set-diagram.png) - -As an example, we can look at a standalone metric to run a tool called netmark. - -```yaml -apiVersion: flux-framework.org/v1alpha1 -kind: MetricSet -metadata: - labels: - app.kubernetes.io/name: metricset - app.kubernetes.io/instance: metricset-sample - name: metricset-sample -spec: - # Number of indexed jobs to run netmark on - pods: 4 - metrics: - - name: network-netmark - - # Custom options for netmark - # see pkg/metrics/network/netmark.go - options: - tasks: 4 -``` - -This is a standalone metric because it creates a JobSet with not one replicated job, but two! There is a launcher container -to issue an `mpirun` command, and one or more worker containers that interact via MPI. This is a simple example, but any design -for a JobSet could work here, and hence why the metric is standalone. However, it's neat that the interface presented to you -is consistent - it's simply a matter of asking for the metric that is known to the operator to be a standalone. -The image below also demonstrates that this standalone metric (along with storage or application) can be scaled to more -than one pod, if appropriate. - -![img/standalone-metric-set.png](img/standalone-metric-set.png) -For more detail about this design, see the [developer docs](../development/index.md). +For another overview of these designs, please see the [developer docs](../development/designs/index.md). ## Containers Available diff --git a/examples/tests/perf-lammps-hpctoolkit/metrics.yaml b/examples/tests/perf-lammps-hpctoolkit/metrics.yaml new file mode 100644 index 0000000..4384b3b --- /dev/null +++ b/examples/tests/perf-lammps-hpctoolkit/metrics.yaml @@ -0,0 +1,32 @@ +apiVersion: flux-framework.org/v1alpha1 +kind: MetricSet +metadata: + labels: + app.kubernetes.io/name: metricset + app.kubernetes.io/instance: metricset-sample + name: metricset-sample +spec: + # Number of pods for lammps (one launcher, the rest workers) + pods: 4 + logging: + interactive: true + + metrics: + # Running more scaled lammps is our main goal + - name: app-lammps + options: + command: mpirun --hostfile /root/hostlist.txt -np 2 --map-by socket lmp -v x 2 -v y 2 -v z 2 -in in.reaxc.hns -nocite + workdir: /opt/lammps/examples/reaxff/HNS + + # Add on hpctoolkit, will mount a volume and wrap lammps + addons: + - name: perf-hpctoolkit + options: + mount: /opt/mnt + events: "-e IO" + + # Ensure the working directory is consistent + workdir: /opt/lammps/examples/reaxff/HNS + + # Target container for entrypoint addition is the launcher, not workers + containerTarget: launcher \ No newline at end of file diff --git a/pkg/addons/containers.go b/pkg/addons/containers.go index 05746ff..72271f0 100644 --- a/pkg/addons/containers.go +++ b/pkg/addons/containers.go @@ -30,7 +30,7 @@ type ApplicationAddon struct { command string // Working Directory - workingDir string + workdir string // Entrypoint of container, if different from command entrypoint string @@ -67,7 +67,7 @@ func (a ApplicationAddon) AssembleContainers() []specs.ContainerSpec { return []specs.ContainerSpec{{ Image: a.image, Name: a.name, - WorkingDir: a.workingDir, + WorkingDir: a.workdir, Command: strings.Split(a.command, " "), // TODO these need to be mapped from m.resources Resources: &api.ContainerResources{}, @@ -102,7 +102,7 @@ func (a *ApplicationAddon) SetDefaultOptions(metric *api.MetricAddon) { } workdir, ok := metric.Options["workdir"] if ok { - a.workingDir = workdir.StrVal + a.workdir = workdir.StrVal } priv, ok := metric.Options["privileged"] if ok { @@ -143,7 +143,7 @@ func (a *ApplicationAddon) SetOptions(metric *api.MetricAddon) { func (a *ApplicationAddon) DefaultOptions() map[string]intstr.IntOrString { values := map[string]intstr.IntOrString{ "image": intstr.FromString(a.image), - "workdir": intstr.FromString(a.workingDir), + "workdir": intstr.FromString(a.workdir), "entrypoint": intstr.FromString(a.entrypoint), "command": intstr.FromString(a.command), } diff --git a/pkg/addons/hpctoolkit.go b/pkg/addons/hpctoolkit.go index 9f7593b..027501f 100644 --- a/pkg/addons/hpctoolkit.go +++ b/pkg/addons/hpctoolkit.go @@ -55,14 +55,10 @@ func (m HPCToolkit) AssembleVolumes() []specs.VolumeSpec { // This is a config map volume with items // It needs to be created in the same metrics operator namespace - // We need a better way to define this, I'm not happy with it. - // There should just be some variables under the volumespec - newVolume := corev1.Volume{ + // Thus we only need the items! + configVolume := corev1.Volume{ VolumeSource: corev1.VolumeSource{ ConfigMap: &corev1.ConfigMapVolumeSource{ - LocalObjectReference: corev1.LocalObjectReference{ - Name: m.volumeName, - }, Items: items, }, }, @@ -77,9 +73,8 @@ func (m HPCToolkit) AssembleVolumes() []specs.VolumeSpec { }, // Mount is set to false here because we mount via metrics_operator - // This is a bit messy (I'm not happy) but I'll make it better { - Volume: newVolume, + Volume: configVolume, ReadOnly: true, Mount: false, Path: filepath.Dir(m.entrypointPath), @@ -112,7 +107,7 @@ func (a *HPCToolkit) SetOptions(metric *api.MetricAddon) { } workdir, ok := metric.Options["workdir"] if ok { - a.workingDir = workdir.StrVal + a.workdir = workdir.StrVal } target, ok := metric.Options["target"] if ok { @@ -202,9 +197,6 @@ echo "%s" # hpcprof hpctoolkit-sleep-measurements # hpcstruct hpctoolkit-sleep-measurements # hpcviewer ./hpctoolkit-lmp-database -workdir="%s" -echo "Changing directory to ${workdir}" -cd ${workdir} ` preBlock = fmt.Sprintf( preBlock, @@ -214,9 +206,17 @@ cd ${workdir} a.events, metadata.CollectionStart, metadata.Separator, - a.workingDir, ) + // Add the working directory, if defined + if a.workdir != "" { + preBlock += fmt.Sprintf(` +workdir="%s" +echo "Changing directory to ${workdir}" +cd ${workdir} +`, a.workdir) + } + // TODO we may want to target specific entrypoint scripts here // Right now we target all scripts associated with the job for _, containerSpec := range cs { diff --git a/pkg/metrics/volumes.go b/pkg/metrics/volumes.go index a00770b..670de16 100644 --- a/pkg/metrics/volumes.go +++ b/pkg/metrics/volumes.go @@ -26,7 +26,6 @@ func getVolumeMounts( ) []corev1.VolumeMount { // This is for the core entrypoints (that are generated as config maps here) - // TODO an addon that generates a volume needs to be added to this set... mounts := []corev1.VolumeMount{ { Name: set.Name, @@ -39,7 +38,6 @@ func getVolumeMounts( for _, vs := range volumes { // Is this volume indicated for mount? - // This might not be necessary... if vs.Mount { mount := corev1.VolumeMount{ Name: vs.Volume.Name, @@ -80,7 +78,10 @@ func getExtraConfigmaps(volumes []specs.VolumeSpec) []corev1.KeyToPath { for _, addedVolume := range volumes { - // TODO need to type check here + // Check that the typs is config map + if addedVolume.Volume.ConfigMap == nil { + continue + } // This will error if it's not a config map :) if addedVolume.Volume.Name == "" { for _, item := range addedVolume.Volume.ConfigMap.Items {