-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add model-server/estimator to KeplerInternal #322
add model-server/estimator to KeplerInternal #322
Conversation
8ff6a09
to
f444a30
Compare
pkg/components/exporter/exporter.go
Outdated
if ms.Enabled { | ||
exporterConfigMap["MODEL_SERVER_ENABLE"] = "true" | ||
} | ||
modelServerConfig := modelserver.ModelServerConfigForClient(k.ModelServerDeploymentName(), k.Spec.ModelServer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modelServerConfig := modelserver.ModelServerConfigForClient(k.ModelServerDeploymentName(), k.Spec.ModelServer) | |
modelServerConfig := modelserver.ConfigForClient(k.ModelServerDeploymentName(), k.Spec.ModelServer) |
How about renaming modelserver.ModelServer<X>
to modelserver.<X>
to avoid stuttering ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if k.Spec.ModelServer.ClientConfig()
is better since all the function only requires the spec from Modelserver to compute client config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes..it is the config for client of the model server (i.e., exporter/estimator container).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sthaha Sorry, I misunderstood your comment.
Just change the method name: https://github.com/sustainable-computing-io/kepler-operator/compare/5fa1b54bc0dc3b2f68894af0cc89b7341f8a80f1..6209dadf4f6f04ed3e786989f062f71459dd06f2
Note: cannot have k.Spec.ModelServer.ClientConfig()
I cannot have(ms *v1alpha1.InternalModelServerSpec) ConfigForClient
, it needs to be inside v1alpha1.
And, I cannot move the method to v1alpha1 since it needs k8s module and k8s module calls v1alpha1 (circle call)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sthaha Please confirm whether the change resolved?
@sunya-ch does this support deployment of multiple model-servers like it currently allows multiple May be worth considering is .. if it makes sense to have a separate CRD for KeplerModelServer. This would allow for multiple model servers to be deployed (if that is even a case) with different configs. |
f444a30
to
41f9f3f
Compare
Yes, I generate model server name in the same way of the kepler exporter name (based on CR + suffix of |
41f9f3f
to
5fa1b54
Compare
@sunya-ch Overall the code looks good to me one missing part is the e2e test. |
I cannot see the e2e test for the KeplerInternal CR. I think it would become too big change on this PR to add the e2e test for the keplerinternal. Could that be done by the other PR then I could help add the model server part? See the issue open here: #314 |
5fa1b54
to
6209dad
Compare
The current Kepler tests cover almost all the usecase currently supported by kepler-internal, so having a set of tests that replicate what creation of kepler already does gives us only a low ROI. IMHO, all features should be be accompanied by tests that validate most common usecases. It shouldn't be too hard to add an e2e by making a copy of the existing kepler-e2e. |
that isn't required for |
6209dad
to
4c64d35
Compare
I see. I have it because first I tried to create with specifying the image name but it turns out that it is not allowed for keplerinternal. Just didn't remove it out ;) |
Yes, it might be but I still think it should be on another PR for better track. I can rebase this PR from the PR. |
4c64d35
to
23be3c5
Compare
23be3c5
to
e370bab
Compare
convert to draft building on top of the PR #325. |
6a11dad
to
0b10349
Compare
95bf443
to
a4177f9
Compare
@sthaha Sorry for multiple force-pushed. Additional change are adding e2e test case and Enabled() function for InternalEstimatorSpec.
|
a4177f9
to
e75fcca
Compare
e75fcca
to
3238a90
Compare
3238a90
to
a69c9e3
Compare
pkg/components/exporter/exporter.go
Outdated
k8s.VolumeFromHost("lib-modules", "/lib/modules"), | ||
k8s.VolumeFromHost("tracing", "/sys"), | ||
k8s.VolumeFromHost("proc", "/proc"), | ||
k8s.VolumeFromHost("kernel-src", "/usr/src/kernels"), | ||
k8s.VolumeFromHost("kernel-debug", "/sys/kernel/debug"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a better way to handling volumes. - In another PR
E.g. each New<X>Container
can return []NamedMount
.
type NamedMount string
const (
HostLibModulesMount NamedMount = "host-lib-modules"
HostProc = "host-proc"
KeplerConfigMapMount = "cm-kepler"
)
func (m HostMount) Volume() corev1.Volume {
mounts := map[HostMount]string{
LibModulesMount: "/lib/modules",
...
}
if strings.StartsWith("host-", m) {
return k8s.VolumeFromHost(m, mounts[m])
} else if strings.StartsWith("cm-", m)
return k8s.VolumeFromConfigMap(m, mounts[m])
}
}
pkg/utils/test/assertions.go
Outdated
|
||
func (f Framework) AssertInternalStatus(ki *v1alpha1.KeplerInternal) { | ||
// the status will be updated | ||
ki = f.WaitUntilInternalCondition(ki.Name, v1alpha1.Reconciled, v1alpha1.ConditionTrue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ki = f.WaitUntilInternalCondition(ki.Name, v1alpha1.Reconciled, v1alpha1.ConditionTrue) | |
ki = f.WaitUntilInternalCondition(name, v1alpha1.Reconciled, v1alpha1.ConditionTrue) |
@vprashar2929 could you please help validate this on OpenShift ?.. just ensuring the supported usecase of creating a |
5444912
to
6037dd0
Compare
Couple of observations while testing this on OpenShift(4.13):
|
3e8fba0
to
6b0fb79
Compare
@vprashar2929 Thank you. I did see the manual namespace set for model server default URL. Now, I updated it to use the namespace from the kepler spec. For status not updated, could you share describe result? I cannot find the cause of issue since I can see the updated status on my OpenShift cluster (4.12). From the above result, it seems the issue not only from model server but also the status of the deamonset. |
pkg/utils/test/framework.go
Outdated
if errors.IsNotFound(err) { | ||
return true, fmt.Errorf("kepler-internal %s is not found", name) | ||
} | ||
statusOK := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
statusOK := true | |
var statusOK bool |
Should we use the namespace from
So when I created keplerinternals I used a different namespace from |
Signed-off-by: Sunyanan Choochotkaew <[email protected]>
6b0fb79
to
2e5423e
Compare
Yes, it has no affect because the default is false. Only enable when it is set to true. (reference: https://github.com/sustainable-computing-io/kepler/blob/442bcfe5d3bf26a1285ff0f13d1f5017d28b9e37/pkg/model/model.go#L189) |
Finally, I guess so, we may allow Kepler to deploy on any namespace for keplerinternal. We need to have Kepler-operator add the new namespace to the cache. Now, we have to rely on the additional namespace list in the command line. Kepler deployer has to know in advance the namespace list where they are going to allow the keplerinternal to be installed. kepler-operator/cmd/manager/main.go Line 90 in 73505af
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks a lot @sunya-ch 🤗
This PR replaces #235 by moving the integration to Kepler-internal API.
Change summary:
KeplerInternalSpec
andKeplerInternalStatus
AddIfNotEmpty
andVolumeFromEmptyDir
utility function (used for Estimator and ModelServer creation)Bug fixes (not related to model server):
UP-TO-DATE
statusKeplerInternal
Here is the CR that I used for running in my local cluster.
KeplerInternal Status
With neither estimator nor modelserver
With only estimator
With estimator+modelserver
Signed-off-by: Sunyanan Choochotkaew [email protected]