Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSOC] hyperopt suggestion service logic update #2412

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

shashank-iitbhu
Copy link
Contributor

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2374

Checklist:

  • Docs included if any changes are user facing

Signed-off-by: Shashank Mittal <[email protected]>
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tenzen-y
Copy link
Member

/area gsoc

@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Aug 26, 2024
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @shashank-iitbhu!
I left a few comments

- "file-metrics-collector,pytorchjob-mnist"
- "median-stop-with-json-format,file-metrics-collector-with-json-format"
- "median-stop-with-json-format,file-metrics-collector-with-json-format"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add new line.

feasibleSpace:
min: "0.5"
max: "0.9"
distribution: "logUniform"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add other distribution in this example to make sure we will validate them:

normal
logNormal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

NORMAL = 2;
LOG_NORMAL = 3;
DISTRIBUTION_UNKNOWN = 4;
DISTRIBUTION_UNKNOWN = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the same name as for parameter_type

Suggested change
DISTRIBUTION_UNKNOWN = 0;
UNKNOWN_DISTRIBUTION = 0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DISTRIBUTION_UNKNOWN = 0;
DISTRIBUTION _UNSPECIFIED = 0;

I would like to select the UNSPECIFIED suffix here.
Please see: https://google.aip.dev/126

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, @tenzen-y should we rename other gRPC parameters to UNSPECIFIED ?

Copy link
Member

@tenzen-y tenzen-y Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing released gRPC, it indicates losing backward compatibility.
So, I would like to keep using the existing API for released protocolbuffers API, @andreyvelich WDYT?

Copy link
Member

@andreyvelich andreyvelich Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these gRPC APIs are not exposed to the end-users, do you still think that we should not change the existing APIs ?
It only affects users who build their own Suggestion service.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these gRPC APIs are not exposed to the end-users, do you still think that we should not change the existing APIs ?

Almost correct. Additionally, when users keep using the removed Suggestion Services like the Chocolate Suggestion, users face the same problem.

So, can we collect feedback on the dedicated issue outside of here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's followup on this in the issue, and rename it after few months if we don't get any feedback.
@shashank-iitbhu Please can you create an issue to track it ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's followup on this in the issue, and rename it after few months if we don't get any feedback. @shashank-iitbhu Please can you create an issue to track it ?

Sure, I will create a separate issue to track the renaming of other gRPC parameters to UNSPECIFIED.

@@ -533,14 +533,6 @@ func convertParameterType(typ experimentsv1beta1.ParameterType) suggestionapi.Pa

func convertFeasibleSpace(fs experimentsv1beta1.FeasibleSpace) *suggestionapi.FeasibleSpace {
distribution := convertDistribution(fs.Distribution)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since convertDistribution doesn't return error, I think you can simple add this line in return statement:

return &suggestionapi.FeasibleSpace{
		Max:          fs.Max,
		Min:          fs.Min,
		List:         fs.List,
		Step:         fs.Step,
		Distribution: convertDistribution(fs.Distribution),
	}

name="param-5",
parameter_type=api_pb2.DOUBLE,
feasible_space=api_pb2.FeasibleSpace(
max="5", min="1", list=[], step="0.5", distribution=api_pb2.LOG_UNIFORM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more tests cases for other hyperopt distributions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)
elif param.type == DOUBLE:
hyperopt_search_space[param.name] = hyperopt.hp.uniform(
hyperopt_search_space[param.name] = hyperopt.hp.uniformint(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If parameter is int, why we can't support other distributions like lognormal ?

Copy link
Contributor Author

@shashank-iitbhu shashank-iitbhu Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distributions like uniform quniform loguniform normal etc return float values. They are designed to sample from a range of values that can take any real number (float), which might not make sense if we're looking for an integer value.
Although we can definitely add support for these distributions when parameter is int also. Should we do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y @kubeflow/wg-training-leads @shashank-iitbhu Should we round this float value to int if user wants to use this distribution and int parameter type ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y @kubeflow/wg-training-leads @shashank-iitbhu Should we round this float value to int if user wants to use this distribution and int parameter type ?

SGTM
Users can specify the double parameter type if they want to compute more exactly.
But, documentation of this restriction for int parameter type would be better.

Comment on lines 96 to 99
else:
hyperopt_search_space[param.name] = hyperopt.hp.uniform(
param.name, float(param.min), float(param.max)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can simplify this if statement by checking if distribution==UNIFORM or UNKNOWN and step is null, we use hyperopt.hp.uniform().

@shashank-iitbhu shashank-iitbhu force-pushed the feat/hyperopt-suggestion-service-update branch from 1a7a831 to fddb763 Compare September 10, 2024 16:23
Signed-off-by: Shashank Mittal <[email protected]>

validation fix

add e2e tests for hyperopt

added e2e test to workflow
Signed-off-by: Shashank Mittal <[email protected]>
Signed-off-by: Shashank Mittal <[email protected]>
Signed-off-by: Shashank Mittal <[email protected]>

sigma calculation fixed

fix

parse new arguments to mnist.py
@shashank-iitbhu shashank-iitbhu force-pushed the feat/hyperopt-suggestion-service-update branch from fddb763 to 282f81d Compare September 10, 2024 16:33
)
elif param.distribution == api_pb2.NORMAL:
mu = (float(param.min) + float(param.max)) / 2
sigma = (float(param.max) - float(param.min)) / 6
Copy link
Contributor Author

@shashank-iitbhu shashank-iitbhu Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed this article to determine the value of sigma from min and max.
cc @tenzen-y @andreyvelich

Signed-off-by: Shashank Mittal <[email protected]>
)
elif param.distribution == api_pb2.NORMAL:
mu = (float(param.min) + float(param.max)) / 2
sigma = (float(param.max) - float(param.min)) / 6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sigma = (float(param.max) - float(param.min)) / 6
// We consider the normal distribution based on the range of ±3 sigma.
sigma = (float(param.max) - float(param.min)) / 6

@shashank-iitbhu
Copy link
Contributor Author

@tenzen-y I have added two new parameters, weight_decay and dropout_rate, to the Hyperopt example and passed them to mnist.py, but I haven't used them in the Net class yet in the train and test functions. If you check the logs for this e2e test, the maximum value of the loss metrics is an enormously large number. I can't figure out what I'm missing. Also tested this locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[GSOC] Project 8: Support various Parameter Distribution in Katib
3 participants