Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK] grpc-related bugs in Python SDK #2395

Closed
Electronic-Waste opened this issue Jul 25, 2024 · 10 comments · Fixed by #2398
Closed

[SDK] grpc-related bugs in Python SDK #2395

Electronic-Waste opened this issue Jul 25, 2024 · 10 comments · Fixed by #2398

Comments

@Electronic-Waste
Copy link
Member

Electronic-Waste commented Jul 25, 2024

What happened?

image

image

When I called report_metrics in the SDK built by myself, the first error above occurred.

After I modified grpc.beta.implementations.insecure_channel to grpc.insecure_channel, the second error above occurred.

What did you expect to happen?

After PR #2344 was merged to the main branch, beta_create_DBManager_stub was removed in api_pb2.py. But we did not make changes in the following part of codes so that the first error occurred

with katib_api_pb2.beta_create_DBManager_stub(channel) as client:
try:
# When metric name is empty, we select all logs from the Katib DB.
observation_logs = client.GetObservationLog(
katib_api_pb2.GetObservationLogRequest(trial_name=name),
timeout=timeout,
)

with katib_api_pb2.beta_create_DBManager_stub(channel) as client:
try:
timestamp = datetime.now(timezone.utc).strftime(constants.RFC3339_FORMAT)
client.ReportObservationLog(

Also grpcio package after 1.0.0 has removed beta version. Sometimes we may get errors when we use beta version.

channel = grpc.beta.implementations.insecure_channel(
db_manager_address[0], int(db_manager_address[1])
)

channel = grpc.beta.implementations.insecure_channel(
db_manager_address[0], int(db_manager_address[1])
)

I think we need to may some changes to Python SDK. WDYT👀 @andreyvelich @tenzen-y @johnugeorge

Environment

Kubernetes version:

Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.1

Katib controller version:

Built by myself

Katib Python SDK version:

(Also built by myself)
Name: kubeflow-katib
Version: 0.17.0
Summary: Katib Python SDK for APIVersion v1beta1
Home-page: https://github.com/kubeflow/katib/tree/master/sdk/python/v1beta1
Author: Kubeflow Authors
Author-email: [email protected]
License: Apache License Version 2.0
Location: /home/xxx/.local/lib/python3.10/site-packages/kubeflow_katib-0.17.0-py3.10.egg
Requires: certifi, grpcio, kubernetes, protobuf, setuptools, six, urllib3
Required-by: 

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

@Electronic-Waste
Copy link
Member Author

related PR: #2050

I think we also need to copy api_pb2_grpc.py to Python SDK.

@Electronic-Waste
Copy link
Member Author

I encountered this bug while making the push-based metrics collection demo. This issue is also related to the project.

ref issue: #2340

@tenzen-y
Copy link
Member

tenzen-y commented Jul 26, 2024

Could you implement the reproducible UTs or E2E test for the 'report_metrics?
IIUC, report_metrics is added without any tests.

@Electronic-Waste
Copy link
Member Author

Yes, I think we should also add some UTs or E2E tests for functions in katib_client.py like #2325 too.

I guess it will be better to describe the test case problem in another issue and open PRs to it.

WDYT👀 @tenzen-y @andreyvelich @johnugeorge @gaocegege

@tenzen-y
Copy link
Member

Yes, I think we should also add some UTs or E2E tests for functions in katib_client.py like #2325 too.

I guess it will be better to describe the test case problem in another issue and open PRs to it.

WDYT👀 @tenzen-y @andreyvelich @johnugeorge @gaocegege

I have strong objections to evolving the report_metrics API without any tests since the report_metrics API is new one.
The report_metrics can not be handled the same as other Python API.

Before we move forward, we should implement the UTs for the report_metrcs so that we prove that the new feature doesn't have regressions. We should not add new features without tests.

Whether the feature doesn't have potential regressions are big deal since Katib is not experimental project, Katib is production project.

@andreyvelich
Copy link
Member

I agree with @tenzen-y, please let's work on the unit test for report_metrics API and for tune API first to avoid this problems in the first phase.

@Electronic-Waste
Copy link
Member Author

@tenzen-y @andreyvelich UTs for report_metrics has been added by now.

I'll work on UTs for tune API after this issue is addressed.

@andreyvelich
Copy link
Member

/remove-label lifecycle/needs-triage

Copy link

@andreyvelich: The label(s) /remove-label lifecycle/needs-triage cannot be applied. These labels are supported: tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/remove-label lifecycle/needs-triage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@andreyvelich
Copy link
Member

/area sdk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants