Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom: add support for custom container #84

Merged
merged 3 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/build-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
uses: actions/checkout@v3
- uses: actions/setup-go@v3
with:
go-version: ^1.18.1
go-version: ^1.22
- name: GHCR Login
if: (github.event_name != 'pull_request')
uses: docker/login-action@v2
Expand Down Expand Up @@ -48,7 +48,7 @@ jobs:
uses: actions/checkout@v3
- uses: actions/setup-go@v3
with:
go-version: ^1.18.1
go-version: ^1.22
- name: GHCR Login
if: (github.event_name != 'pull_request')
uses: docker/login-action@v2
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/helm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
uses: actions/checkout@v3
- uses: actions/setup-go@v3
with:
go-version: ^1.18.1
go-version: ^1.22
- name: GHCR Login
if: (github.event_name != 'pull_request')
uses: docker/login-action@v2
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: ^1.20
go-version: ^1.22
- name: fmt check
run: make fmt

Expand Down Expand Up @@ -88,7 +88,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: ^1.20
go-version: ^1.22

- name: Start minikube
uses: medyagh/setup-minikube@697f2b7aaed5f70bf2a94ee21a4ec3dde7b12f92 # v0.0.9
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: ^1.20
go-version: ^1.22

- name: Start minikube
uses: medyagh/setup-minikube@697f2b7aaed5f70bf2a94ee21a4ec3dde7b12f92 # v0.0.9
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
echo "tag=${{ inputs.release_tag }}" >> ${GITHUB_ENV}
- uses: actions/setup-go@v3
with:
go-version: ^1.20
go-version: ^1.22
- name: GHCR Login
uses: docker/login-action@v2
with:
Expand Down Expand Up @@ -51,7 +51,7 @@ jobs:
uses: actions/checkout@v3
- uses: actions/setup-go@v3
with:
go-version: ^1.20
go-version: ^1.22
- name: Set tag
run: |
echo "Tag for release is ${{ inputs.release_tag }}"
Expand Down Expand Up @@ -86,7 +86,7 @@ jobs:
uses: actions/checkout@v3
- uses: actions/setup-go@v3
with:
go-version: ^1.20
go-version: ^1.22
- name: Set tag
run: |
echo "Tag for release is ${{ inputs.release_tag }}"
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Build the manager binary
FROM golang:1.20 as builder
FROM golang:1.22 as builder
ARG TARGETOS
ARG TARGETARCH

Expand Down
16 changes: 15 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,19 @@ deploy: manifests kustomize ## Deploy controller to the K8s cluster specified in
undeploy: ## Undeploy controller from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
$(KUSTOMIZE) build config/default | kubectl delete --ignore-not-found=$(ignore-not-found) -f -


.PHONY: test-deploy
test-deploy: manifests kustomize
docker build --no-cache -t ${DEVIMG} .
docker push ${DEVIMG}
cd config/manager && $(KUSTOMIZE) edit set image controller=${DEVIMG}
$(KUSTOMIZE) build config/default > examples/dist/metrics-operator-dev.yaml

.PHONY: test-deploy-recreate
test-deploy-recreate: test-deploy
kubectl delete -f ./examples/dist/metrics-operator-dev.yaml || echo "Already deleted"
kubectl apply -f ./examples/dist/metrics-operator-dev.yaml

##@ Build Dependencies

## Location to install dependencies to
Expand All @@ -187,7 +200,7 @@ ENVTEST ?= $(LOCALBIN)/setup-envtest

## Tool Versions
KUSTOMIZE_VERSION ?= v3.8.7
CONTROLLER_TOOLS_VERSION ?= v0.11.1
CONTROLLER_TOOLS_VERSION ?= v0.14.0

KUSTOMIZE_INSTALL_SCRIPT ?= "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"
.PHONY: kustomize
Expand All @@ -205,6 +218,7 @@ $(CONTROLLER_GEN): $(LOCALBIN)
test -s $(LOCALBIN)/controller-gen && $(LOCALBIN)/controller-gen --version | grep -q $(CONTROLLER_TOOLS_VERSION) || \
GOBIN=$(LOCALBIN) go install sigs.k8s.io/controller-tools/cmd/controller-gen@$(CONTROLLER_TOOLS_VERSION)


.PHONY: envtest
envtest: $(ENVTEST) ## Download envtest-setup locally if necessary.
$(ENVTEST): $(LOCALBIN)
Expand Down
13 changes: 8 additions & 5 deletions api/v1alpha2/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

58 changes: 34 additions & 24 deletions config/crd/bases/flux-framework.org_metricsets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,7 @@ apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.11.1
creationTimestamp: null
controller-gen.kubebuilder.io/version: v0.14.0
name: metricsets.flux-framework.org
spec:
group: flux-framework.org
Expand All @@ -21,14 +20,19 @@ spec:
description: MetricSet is the Schema for the metrics API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
Expand All @@ -37,21 +41,23 @@ spec:
properties:
deadlineSeconds:
default: 31500000
description: Should the job be limited to a particular number of seconds?
description: |-
Should the job be limited to a particular number of seconds?
Approximately one year. This cannot be zero or job won't start
format: int64
type: integer
dontSetFQDN:
description: Don't set JobSet FQDN
type: boolean
logging:
description: Logging spec, preparing for other kinds of logging Right
now we just include an interactive option
description: |-
Logging spec, preparing for other kinds of logging
Right now we just include an interactive option
properties:
interactive:
description: Don't allow the application, metric, or storage test
to finish This adds sleep infinity at the end to allow for interactive
mode.
description: |-
Don't allow the application, metric, or storage test to finish
This adds sleep infinity at the end to allow for interactive mode.
type: boolean
type: object
metrics:
Expand All @@ -60,15 +66,15 @@ spec:
items:
properties:
addons:
description: A Metric addon can be storage (volume) or an application,
It's an additional entity that can customize a replicated
job, either adding assets / features or entire containers
to the pod
description: |-
A Metric addon can be storage (volume) or an application,
It's an additional entity that can customize a replicated job,
either adding assets / features or entire containers to the pod
items:
description: 'A Metric addon is an interface that exposes
extra volumes for a metric. Examples include: A storage
volume to be mounted on one or more of the replicated jobs
A single application container.'
description: |-
A Metric addon is an interface that exposes extra volumes for a metric. Examples include:
A storage volume to be mounted on one or more of the replicated jobs
A single application container.
properties:
listOptions:
additionalProperties:
Expand Down Expand Up @@ -129,7 +135,9 @@ spec:
- type: string
x-kubernetes-int-or-string: true
type: array
description: Metric List Options Metric specific options
description: |-
Metric List Options
Metric specific options
type: object
mapOptions:
additionalProperties:
Expand All @@ -149,7 +157,9 @@ spec:
- type: integer
- type: string
x-kubernetes-int-or-string: true
description: Metric Options Metric specific options
description: |-
Metric Options
Metric specific options
type: object
resources:
description: Resources include limits and requests for the metric
Expand Down
1 change: 0 additions & 1 deletion config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: null
name: manager-role
rules:
- apiGroups:
Expand Down
2 changes: 1 addition & 1 deletion controllers/metric/metric_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ func (r *MetricSetReconciler) Reconcile(ctx context.Context, req ctrl.Request) (

// Ensure the metricset is mapped to a JobSet. For design:
// 1. If an application is provided, we pair the application at some scale with each metric as a contaienr
// 2. If storage is provided, we create the volumes for the metric containers
// 2. If storage or other addons are provided, we create the volumes for the metric containers
result, err := r.ensureMetricSet(ctx, &spec, &set)
if err != nil {
r.Log.Error(err, "🟥️ Issue ensuring metric set")
Expand Down
7 changes: 7 additions & 0 deletions docs/_static/data/metrics.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,13 @@
"image": "ghcr.io/converged-computing/metric-cabanapic:latest",
"url": "https://github.com/ECP-copa/CabanaPIC"
},
{
"name": "app-custom",
"description": "Provide a custom application for MPI trace",
"family": "proxyapp",
"image": "",
"url": "https://converged-computing.github.io/metrics-operator"
},
{
"name": "app-hpl",
"description": "High-Performance Linpack (HPL)",
Expand Down
2 changes: 1 addition & 1 deletion docs/getting_started/addons.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,7 @@ wrapper to the actual executable.

### perf-mpitrace

- *[perf-mpitrace](https://github.com/converged-computing/metrics-operator/tree/main/examples/addons/perf-mpitrace)*
- *[perf-mpitrace](https://github.com/converged-computing/metrics-operator/tree/main/examples/addons/mpitrace-lammps)*

This metric provides [mpitrace](https://github.com/IBM/mpitrace) to wrap an MPI application. The setup is the same as hpctoolkit, and we
currently only provide a rocky base (please let us know if you need another). It works by way of wrapping the mpirun command with `LD_PRELOAD`.
Expand Down
43 changes: 43 additions & 0 deletions docs/getting_started/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,49 @@ Here are some useful resources for the benchmarks:
- [HPC Council](https://hpcadvisorycouncil.atlassian.net/wiki/spaces/HPCWORKS/pages/1284538459/OSU+Benchmark+Tuning+for+2nd+Gen+AMD+EPYC+using+HDR+InfiniBand+over+HPC-X+MPI)
- [AWS Tutorials](https://www.hpcworkshops.com/08-efa/04-complie-run-osu.html)

### app-custom

A custom application can support any application to be used as a metric app. For the following parameters, "command" and "container" are required.

| Name | Description | Option Key | Type | Default |
|-----|-------------|------------|------|---------|
| command | The full mpirun command | options->command |string | unset |
| workdir | The working directory for the command | options->workdir | string | unset |
| soleTenancy | require each pod to have sole tenancy | command->soleTenancy | string | "false" |

As an example, here is running mpitrace (an addon) with a custom container.

```yaml
apiVersion: flux-framework.org/v1alpha2
kind: MetricSet
metadata:
labels:
app.kubernetes.io/name: metricset
app.kubernetes.io/instance: metricset-sample
name: metricset-sample
spec:
# Number of pods for lammps (one launcher, the rest workers)
pods: 4
metrics:
- name: app-custom
image: ghcr.io/converged-computing/<your-container>
options:
command: mpirun --hostfile ./hostlist.txt -mca orte_keep_fqdn_hostnames t -np 4 --map-by socket <app> <options>
workdir: <workdir>

# Add on hpctoolkit, will mount a volume and wrap lammps
addons:
- name: perf-mpitrace
options:
mount: /opt/mnt
image: ghcr.io/converged-computing/metric-mpitrace:ubuntu-jammy
workdir: <workdir>
# this is the target of the replicated job "l" means launcher
target: l
# This is the target container, with full name "launcher"
containerTarget: launcher
```

### app-lammps

- *[app-lammps](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/app-lammps)*
Expand Down
1 change: 0 additions & 1 deletion examples/addons/mpitrace-lammps/metrics-rocky.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ spec:
command: /opt/intel/mpi/2021.8.0/bin/mpirun --hostfile ./hostlist.txt -np 4 --map-by socket lmp -v x 2 -v y 2 -v z 2 -in in.reaxc.hns -nocite
workdir: /opt/lammps/examples/reaxff/HNS

# Add on hpctoolkit, will mount a volume and wrap lammps
addons:
- name: perf-mpitrace
options:
Expand Down
Loading
Loading