Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Adding trial name as env for metrics collectors #2165

Closed

Conversation

axel7083
Copy link

What this PR does / why we need it:

This will add the KATIB_TRIAL_NAME env containing the trial name to all the containers.

This would simply many interactions, the trial name is injected for every metrics collectors, which made it more difficult to create custom collectors.

The typical example is the following in the #2118

trial_name = "-".join(opt.pod_name.split("-")[:-1])

Parsing the pod_name is not really a good practice. Instead injecting with a proper env is probably more suitable.
Injecting in every pods make it also easier for trial without metrics collector to push themself metrics to katib.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

  • Docs included if any changes are user facing

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: axel7083
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axel7083 Thank you for the contribution! I left a few comments.

@kubeflow/wg-automl-leads Could you approve CI?

pkg/webhook/v1beta1/pod/inject_webhook.go Outdated Show resolved Hide resolved
pkg/webhook/v1beta1/pod/inject_webhook.go Show resolved Hide resolved
pkg/webhook/v1beta1/pod/utils.go Outdated Show resolved Hide resolved
@andreyvelich
Copy link
Member

Thank you for this contribution @axel7083!
Currently, user can get the Trial name inside the training container using TrialSpec: https://www.kubeflow.org/docs/components/katib/trial-template/#use-trial-metadata-in-template.
Here is an example: https://github.com/kubeflow/katib/blob/master/examples/v1beta1/trial-template/trial-metadata-substitution.yaml.
With this change, do we want to always add the Trial name to the training containers ?

Also, I don't understand why we need to add Trial name to the metrics collector container ?

@axel7083
Copy link
Author

axel7083 commented Jul 5, 2023

Thank you for this contribution @axel7083! Currently, user can get the Trial name inside the training container using TrialSpec: https://www.kubeflow.org/docs/components/katib/trial-template/#use-trial-metadata-in-template. Here is an example: https://github.com/kubeflow/katib/blob/master/examples/v1beta1/trial-template/trial-metadata-substitution.yaml. With this change, do we want to always add the Trial name to the training containers ?

Also, I don't understand why we need to add Trial name to the metrics collector container ?

Hummm, I did not know that, I tried to made a custom collectors myself. Base on what I saw in the code

func (s *SidecarInjector) getMetricsCollectorArgs(trial *trialsv1beta1.Trial, metricNames string, mc common.MetricsCollectorSpec, metricsCollectorConfigData katibconfig.MetricsCollectorConfig, esRules []string) ([]string, error) {
args := []string{"-t", trial.Name, "-m", metricNames, "-o-type", string(trial.Spec.Objective.Type), "-s-db", katibmanagerv1beta1.GetDBManagerAddr()}
if mountPath, _ := getMountPath(mc); mountPath != "" {

I thought it was not possible to get the trial name without doing some weird parsing of the hostname. Like I saw in #2118.

Then I do not really understand the following example inside spec.metricsCollectorSpec:

metricsCollectorSpec:
source:
fileSystemPath:
path: "/katib/mnist.log"
kind: File
collector:
kind: Custom
customCollector:
args:
- -m
- accuracy
- -s
- katib-db-manager.kubeflow:6789
- -path
- /katib/mnist.log
image: kubeflowkatib/custom-metrics-collector:latest
imagePullPolicy: Always
name: custom-metrics-logger-and-collector
env:
- name: TrialNamePrefix
valueFrom:
fieldRef:
fieldPath: metadata.name

TrialNamePrefix will be equal to the trial name ? Why the word prefix is used here ?

Other question then, can we use ${trialParameters.trialName} in custom metrics collector ?

@andreyvelich
Copy link
Member

Then I do not really understand the following example inside spec.metricsCollectorSpec:

I think, that is required to get Trial name inside Metrics Collector container since it is not part of Trial Spec.

Other question then, can we use ${trialParameters.trialName} in custom metrics collector ?

No, that is not possible today.

I think, we have a few options:

  1. Always inject Trial name to Metrics Collector and Training containers. Then, we should refactor Trial Metadata substitution.
  2. Allow to set ${trialParameters.x} in the Metrics Collector container. And do the appropriate substitution during Trial creation.
  3. Get the Trial name from Pod labels. Since we always inject Trial name to Trial pods, you can get this value in the metrics collector container as follows: metadata.labels[katib.kubeflow.org/trial]

What do you think @tenzen-y @johnugeorge @axel7083 ?

@tenzen-y
Copy link
Member

tenzen-y commented Jul 7, 2023

Then I do not really understand the following example inside spec.metricsCollectorSpec:

I think, that is required to get Trial name inside Metrics Collector container since it is not part of Trial Spec.

Other question then, can we use ${trialParameters.trialName} in custom metrics collector ?

No, that is not possible today.

I think, we have a few options:

  1. Always inject Trial name to Metrics Collector and Training containers. Then, we should refactor Trial Metadata substitution.
  2. Allow to set ${trialParameters.x} in the Metrics Collector container. And do the appropriate substitution during Trial creation.
  3. Get the Trial name from Pod labels. Since we always inject Trial name to Trial pods, you can get this value in the metrics collector container as follows: metadata.labels[katib.kubeflow.org/trial]

What do you think @tenzen-y @johnugeorge @axel7083 ?

@andreyvelich Thanks for the summary.
+1 on the last option (get the trial name from pod labels) since I think we should use an existing feature rather than expanding the feature if we can inject TrialName into metrics-collector logs.

@axel7083 You can refer to the following how to get trialName from labels:

env:
- name: KATIB_CORE_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace

@axel7083
Copy link
Author

Alright the following solution as @andreyvelich suggested is very suitable for the requirements I had in mind.

...
- name: KATIB_TRIAL_NAME
   valueFrom:
      fieldRef:
         fieldPath: metadata.labels['katib.kubeflow.org/trial']
...

Closing as resolved without modification

@axel7083 axel7083 closed this Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants