Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSOC] hyperopt suggestion service logic update #2412

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
f615e3f
hyperopt suggestion logic update
shashank-iitbhu Aug 21, 2024
a8bc887
Merge upstream master and resolve conflicts in base_service.py and se…
shashank-iitbhu Aug 25, 2024
a67f373
fix
shashank-iitbhu Aug 25, 2024
365c2f5
DISTRIBUTION_UNKNOWN enum set to 0 in gRPC api
shashank-iitbhu Aug 26, 2024
caa2422
convert parameter method fix
shashank-iitbhu Aug 26, 2024
0f38a51
convert feasibleSpace func updated
shashank-iitbhu Sep 3, 2024
ae9fa34
renamed DISTRIBUTION_UNKNOWN to DISTRIBUTION_UNSPECIFIED
shashank-iitbhu Sep 3, 2024
910a46c
fix
shashank-iitbhu Sep 3, 2024
08b01ac
added more test cases for hyperopt distributions
shashank-iitbhu Sep 6, 2024
16dc030
added support for NORMAL and LOG_NORMAL in hyperopt suggestion service
shashank-iitbhu Sep 7, 2024
282f81d
added e2e tests for NORMAL and LOG_NORMAL
shashank-iitbhu Sep 7, 2024
b7d09a6
hyperopt-suggestion example update
shashank-iitbhu Sep 19, 2024
58ab1ac
updated logic for log distributions
shashank-iitbhu Sep 19, 2024
2b1932e
updated logic for log distributions
shashank-iitbhu Sep 19, 2024
2f1c355
e2e test fixed
shashank-iitbhu Sep 22, 2024
8391c29
added support for parameter distributions for Parameter type INT
shashank-iitbhu Sep 22, 2024
23fd30b
unit test fixed
shashank-iitbhu Sep 22, 2024
7f6deb5
Update pkg/suggestion/v1beta1/hyperopt/base_service.py
shashank-iitbhu Sep 22, 2024
b85b4bf
comment fixed
shashank-iitbhu Sep 22, 2024
dc36303
added unit tests for INT parameter type
shashank-iitbhu Sep 22, 2024
658daaf
completed param unit test cases
shashank-iitbhu Sep 22, 2024
5198ad1
handled default case for normal distributions when min or max are not…
shashank-iitbhu Sep 23, 2024
262912d
fixed validation logic for min and max
shashank-iitbhu Oct 4, 2024
81f5526
removed unnecessary test params
shashank-iitbhu Oct 6, 2024
748e4ba
fixes
shashank-iitbhu Oct 8, 2024
14f30a5
added comments
shashank-iitbhu Oct 10, 2024
4f35663
fix
shashank-iitbhu Oct 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/e2e-test-pytorch-mnist.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,5 +41,7 @@ jobs:
- "long-running-resume,from-volume-resume,median-stop"
# others
- "grid,bayesian-optimization,tpe,multivariate-tpe,cma-es,hyperband"
- "hyperopt-distribution"
- "file-metrics-collector,pytorchjob-mnist"
- "median-stop-with-json-format,file-metrics-collector-with-json-format"

shashank-iitbhu marked this conversation as resolved.
Show resolved Hide resolved
74 changes: 74 additions & 0 deletions examples/v1beta1/hp-tuning/hyperopt-distribution.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
namespace: kubeflow
name: hyperopt-distribution
spec:
objective:
type: minimize
goal: 0.05
objectiveMetricName: loss
algorithm:
algorithmName: random
parallelTrialCount: 3
maxTrialCount: 12
maxFailedTrialCount: 3
parameters:
- name: lr
parameterType: double
feasibleSpace:
min: "0.01"
max: "0.05"
step: "0.01"
distribution: normal
- name: momentum
parameterType: double
feasibleSpace:
min: "0.001"
max: "1"
distribution: uniform
- name: epochs
parameterType: int
feasibleSpace:
min: "1"
max: "3"
distribution: logUniform
- name: batch_size
parameterType: int
feasibleSpace:
min: "32"
max: "64"
distribution: logNormal
trialTemplate:
primaryContainerName: training-container
trialParameters:
- name: learningRate
description: Learning rate for the training model
reference: lr
- name: momentum
description: Momentum for the training model
reference: momentum
- name: epochs
description: Epochs
reference: epochs
- name: batchSize
description: Batch Size
reference: batch_size
trialSpec:
apiVersion: batch/v1
kind: Job
spec:
template:
spec:
containers:
- name: training-container
image: docker.io/kubeflowkatib/pytorch-mnist-cpu:latest
command:
- "python3"
- "/opt/pytorch-mnist/mnist.py"
- "--epochs=${trialParameters.epochs}"
- "--batch-size=${trialParameters.batchSize}"
- "--lr=${trialParameters.learningRate}"
- "--momentum=${trialParameters.momentum}"
restartPolicy: Never
180 changes: 90 additions & 90 deletions pkg/apis/manager/v1beta1/api.pb.go

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions pkg/apis/manager/v1beta1/api.proto
Original file line number Diff line number Diff line change
Expand Up @@ -101,11 +101,11 @@ enum ParameterType {
* Distribution types for HyperParameter.
*/
enum Distribution {
UNIFORM = 0;
LOG_UNIFORM = 1;
NORMAL = 2;
LOG_NORMAL = 3;
DISTRIBUTION_UNKNOWN = 4;
DISTRIBUTION_UNSPECIFIED = 0;
UNIFORM = 1;
LOG_UNIFORM = 2;
NORMAL = 3;
LOG_NORMAL = 4;
}

/**
Expand Down
24 changes: 12 additions & 12 deletions pkg/apis/manager/v1beta1/python/api_pb2.py

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions pkg/apis/manager/v1beta1/python/api_pb2.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ class ParameterType(int, metaclass=_enum_type_wrapper.EnumTypeWrapper):

class Distribution(int, metaclass=_enum_type_wrapper.EnumTypeWrapper):
__slots__ = ()
DISTRIBUTION_UNSPECIFIED: _ClassVar[Distribution]
UNIFORM: _ClassVar[Distribution]
LOG_UNIFORM: _ClassVar[Distribution]
NORMAL: _ClassVar[Distribution]
LOG_NORMAL: _ClassVar[Distribution]
DISTRIBUTION_UNKNOWN: _ClassVar[Distribution]

class ObjectiveType(int, metaclass=_enum_type_wrapper.EnumTypeWrapper):
__slots__ = ()
Expand All @@ -39,11 +39,11 @@ DOUBLE: ParameterType
INT: ParameterType
DISCRETE: ParameterType
CATEGORICAL: ParameterType
DISTRIBUTION_UNSPECIFIED: Distribution
UNIFORM: Distribution
LOG_UNIFORM: Distribution
NORMAL: Distribution
LOG_NORMAL: Distribution
DISTRIBUTION_UNKNOWN: Distribution
UNKNOWN: ObjectiveType
MINIMIZE: ObjectiveType
MAXIMIZE: ObjectiveType
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -532,22 +532,12 @@ func convertParameterType(typ experimentsv1beta1.ParameterType) suggestionapi.Pa
}

func convertFeasibleSpace(fs experimentsv1beta1.FeasibleSpace) *suggestionapi.FeasibleSpace {
distribution := convertDistribution(fs.Distribution)
if distribution == suggestionapi.Distribution_DISTRIBUTION_UNKNOWN {
return &suggestionapi.FeasibleSpace{
Max: fs.Max,
Min: fs.Min,
List: fs.List,
Step: fs.Step,
}
}

return &suggestionapi.FeasibleSpace{
Max: fs.Max,
Min: fs.Min,
List: fs.List,
Step: fs.Step,
Distribution: distribution,
Distribution: convertDistribution(fs.Distribution),
}
}

Expand All @@ -562,7 +552,7 @@ func convertDistribution(typ experimentsv1beta1.Distribution) suggestionapi.Dist
case experimentsv1beta1.DistributionLogNormal:
return suggestionapi.Distribution_LOG_NORMAL
default:
return suggestionapi.Distribution_DISTRIBUTION_UNKNOWN
return suggestionapi.Distribution_DISTRIBUTION_UNSPECIFIED
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -618,7 +618,7 @@ func TestConvertDistribution(t *testing.T) {
},
{
inDistribution: experimentsv1beta1.DistributionUnknown,
expectedDistribution: suggestionapi.Distribution_DISTRIBUTION_UNKNOWN,
expectedDistribution: suggestionapi.Distribution_DISTRIBUTION_UNSPECIFIED,
testDescription: "Convert unknown distribution",
},
}
Expand Down
91 changes: 83 additions & 8 deletions pkg/suggestion/v1beta1/hyperopt/base_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,12 @@
# limitations under the License.

import logging
import math

import hyperopt
import numpy as np

from pkg.apis.manager.v1beta1.python import api_pb2
from pkg.suggestion.v1beta1.internal.constant import (
CATEGORICAL,
DISCRETE,
Expand Down Expand Up @@ -62,14 +64,87 @@ def create_hyperopt_domain(self):
# hyperopt.hp.uniform('x2', -10, 10)}
hyperopt_search_space = {}
for param in self.search_space.params:
if param.type == INTEGER:
hyperopt_search_space[param.name] = hyperopt.hp.quniform(
param.name, float(param.min), float(param.max), float(param.step)
)
elif param.type == DOUBLE:
hyperopt_search_space[param.name] = hyperopt.hp.uniform(
param.name, float(param.min), float(param.max)
)
if param.type in [INTEGER, DOUBLE]:
if param.distribution == api_pb2.UNIFORM or param.distribution is None:
Comment on lines 65 to +68
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set the uniform distribution as a default one ?
We can mutate the Uniform distribution in the Experiment defaulter: https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/experiments/v1beta1/experiment_defaults.go
For example, in Optuna the Uniform distribution will be removed in favour of floatDistribution: https://optuna.readthedocs.io/en/stable/reference/generated/optuna.distributions.UniformDistribution.html

# Uniform distribution: values are sampled between min and max.
# If step is defined, we use the quantized version quniform.
if param.step:
hyperopt_search_space[param.name] = hyperopt.hp.quniform(
shashank-iitbhu marked this conversation as resolved.
Show resolved Hide resolved
param.name,
float(param.min),
float(param.max),
float(param.step),
)
elif param.type == INTEGER:
hyperopt_search_space[param.name] = hyperopt.hp.uniformint(
param.name, float(param.min), float(param.max)
)
else:
hyperopt_search_space[param.name] = hyperopt.hp.uniform(
param.name, float(param.min), float(param.max)
)
elif param.distribution == api_pb2.LOG_UNIFORM:
# Log-uniform distribution: used for parameters that vary exponentially.
# We convert min and max to their logarithmic scale using math.log, because
# the log-uniform distribution is applied over the logarithmic range.
if param.step:
hyperopt_search_space[param.name] = hyperopt.hp.qloguniform(
param.name,
math.log(float(param.min)),
math.log(float(param.max)),
float(param.step),
)
else:
hyperopt_search_space[param.name] = hyperopt.hp.loguniform(
param.name,
math.log(float(param.min)),
math.log(float(param.max)),
andreyvelich marked this conversation as resolved.
Show resolved Hide resolved
)
elif param.distribution == api_pb2.NORMAL:
# Normal distribution: used when values are centered around the mean (mu)
# and spread out by sigma. We calculate mu as the midpoint between
# min and max, and sigma as (max - min) / 6. This is based on the assumption
# that 99.7% of the values in a normal distribution fall within ±3 sigma.
mu = (float(param.min) + float(param.max)) / 2
shashank-iitbhu marked this conversation as resolved.
Show resolved Hide resolved
sigma = (float(param.max) - float(param.min)) / 6
Copy link
Contributor Author

@shashank-iitbhu shashank-iitbhu Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed this article to determine the value of sigma from min and max.
cc @tenzen-y @andreyvelich

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add this article to the comments. WDYT @tenzen-y @johnugeorge ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to depend on the individual article. Instead of that, it would be better to add an actual mathematical description here as a comment.

shashank-iitbhu marked this conversation as resolved.
Show resolved Hide resolved

if param.step:
hyperopt_search_space[param.name] = hyperopt.hp.qnormal(
param.name,
mu,
sigma,
float(param.step),
)
else:
hyperopt_search_space[param.name] = hyperopt.hp.normal(
param.name,
mu,
sigma,
)
elif param.distribution == api_pb2.LOG_NORMAL:
# Log-normal distribution: applies when the logarithm
# of the parameter follows a normal distribution.
# We convert min and max to logarithmic scale and calculate
# mu and sigma similarly to the normal distribution,
# but on the log-transformed values to ensure the distribution is correct.
log_min = math.log(float(param.min))
log_max = math.log(float(param.max))
Comment on lines +130 to +131
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use the fixed value when the min and max are scalers the same as Nevergrad, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can, but that was an edge case when min and max are not defined in case of nevergrad.

    elif isinstance(param, (p.Log, p.Scalar)):
        if (param.bounds[0][0] is None) or (param.bounds[1][0] is None):
            if isinstance(param, p.Scalar) and not param.integer:
                return hp.lognormal(label=param_name, mu=0, sigma=1)

For example,

    - name: batch_size
      parameterType: int
      feasibleSpace:
        min: "32"
        max: "64"
        distribution: "logNormal"

The above parameter will be sampled out from this graph:
Screenshot 2024-09-22 at 8 48 49 PM
where u=3.8123 and sigma=0.3465 are calculated by putting min=32 and max=64 in our code. and E(X) represents the mean which is 48 in our case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.
In that case, could you address the cases where min or max is not specified, as well as nevergrad?

https://github.com/facebookresearch/nevergrad/blob/a2006e50b068fe598e0f3d7dab9c9bcf6cf97e00/nevergrad/optimization/externalbo.py#L61-L64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shashank-iitbhu This is still pending.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if param.FeasibleSpace.Max == "" && param.FeasibleSpace.Min == "" {
allErrs = append(allErrs, field.Required(parametersPath.Index(i).Child("feasibleSpace").Child("max"),
fmt.Sprintf("feasibleSpace.max or feasibleSpace.min must be specified for parameterType: %v", param.ParameterType)))
}

The webhook validator requires feasibleSpace.max or feasibleSpace.min to be specified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when either min or max is empty, this validation does not reject the request, right?
So, shouldn't we implement the special case in the Suggestion Service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the validation webhook does not reject the request when either min or max is empty. But I created an example where:

    - name: batch_size
      parameterType: int
      feasibleSpace:
        min: "32"
        distribution: "logNormal"

For this, the experiment is being created but the suggestion service is not sampling out any value hence the trials are not running, though handled this case (when either min or max are not specified) in pkg/suggestion/v1beta1/hyperopt/base_service.py.
Do we need to check experiment_defaults.go file?
https://github.com/kubeflow/katib/blob/867c40a1b0669446c774cd6e770a5b7bbf1eb2f1/pkg/apis/controller/experiments/v1beta1/experiment_defaults.go

mu = (log_min + log_max) / 2
sigma = (log_max - log_min) / 6

if param.step:
hyperopt_search_space[param.name] = hyperopt.hp.qlognormal(
param.name,
mu,
sigma,
float(param.step),
)
else:
hyperopt_search_space[param.name] = hyperopt.hp.lognormal(
param.name,
mu,
sigma,
)
elif param.type == CATEGORICAL or param.type == DISCRETE:
hyperopt_search_space[param.name] = hyperopt.hp.choice(
param.name, param.list
Expand Down
5 changes: 5 additions & 0 deletions pkg/suggestion/v1beta1/internal/constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,8 @@
DOUBLE = "DOUBLE"
CATEGORICAL = "CATEGORICAL"
DISCRETE = "DISCRETE"

UNIFORM = "UNIFORM"
LOG_UNIFORM = "LOG_UNIFORM"
NORMAL = "NORMAL"
LOG_NORMAL = "LOG_NORMAL"
39 changes: 26 additions & 13 deletions pkg/suggestion/v1beta1/internal/search_space.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,25 +82,36 @@ def __str__(self):

@staticmethod
def convert_parameter(p):
distribution = (
p.feasible_space.distribution
if p.feasible_space.distribution != ""
and p.feasible_space.distribution is not None
and p.feasible_space.distribution != api.DISTRIBUTION_UNSPECIFIED
else None
)

if p.parameter_type == api.INT:
# Default value for INT parameter step is 1
step = 1
if p.feasible_space.step is not None and p.feasible_space.step != "":
step = p.feasible_space.step
step = p.feasible_space.step if p.feasible_space.step else 1
return HyperParameter.int(
p.name, p.feasible_space.min, p.feasible_space.max, step
p.name, p.feasible_space.min, p.feasible_space.max, step, distribution
)

elif p.parameter_type == api.DOUBLE:
return HyperParameter.double(
p.name,
p.feasible_space.min,
p.feasible_space.max,
p.feasible_space.step,
distribution,
)

elif p.parameter_type == api.CATEGORICAL:
return HyperParameter.categorical(p.name, p.feasible_space.list)

elif p.parameter_type == api.DISCRETE:
return HyperParameter.discrete(p.name, p.feasible_space.list)

else:
logger.error(
"Cannot get the type for the parameter: %s (%s)",
Expand All @@ -110,33 +121,35 @@ def convert_parameter(p):


class HyperParameter(object):
def __init__(self, name, type_, min_, max_, list_, step):
def __init__(self, name, type_, min_, max_, list_, step, distribution=None):
self.name = name
self.type = type_
self.min = min_
self.max = max_
self.list = list_
self.step = step
self.distribution = distribution

def __str__(self):
if self.type == constant.INTEGER or self.type == constant.DOUBLE:
if self.type in [constant.INTEGER, constant.DOUBLE]:
return (
"HyperParameter(name: {}, type: {}, min: {}, max: {}, step: {})".format(
self.name, self.type, self.min, self.max, self.step
)
f"HyperParameter(name: {self.name}, type: {self.type}, min: {self.min}, "
f"max: {self.max}, step: {self.step}, distribution: {self.distribution})"
)
else:
return "HyperParameter(name: {}, type: {}, list: {})".format(
self.name, self.type, ", ".join(self.list)
)

@staticmethod
def int(name, min_, max_, step):
return HyperParameter(name, constant.INTEGER, min_, max_, [], step)
def int(name, min_, max_, step, distribution=None):
return HyperParameter(
name, constant.INTEGER, min_, max_, [], step, distribution
)

@staticmethod
def double(name, min_, max_, step):
return HyperParameter(name, constant.DOUBLE, min_, max_, [], step)
def double(name, min_, max_, step, distribution=None):
return HyperParameter(name, constant.DOUBLE, min_, max_, [], step, distribution)

@staticmethod
def categorical(name, lst):
Expand Down
2 changes: 1 addition & 1 deletion pkg/webhook/v1beta1/experiment/validator/validator.go
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ func (g *DefaultValidator) validateParameters(parameters []experimentsv1beta1.Pa
allErrs = append(allErrs, field.Invalid(parametersPath.Index(i).Child("feasibleSpace").Child("list"),
param.FeasibleSpace.List, fmt.Sprintf("feasibleSpace.list is not supported for parameterType: %v", param.ParameterType)))
}
if param.FeasibleSpace.Max == "" && param.FeasibleSpace.Min == "" {
if param.FeasibleSpace.Max == "" || param.FeasibleSpace.Min == "" {
allErrs = append(allErrs, field.Required(parametersPath.Index(i).Child("feasibleSpace").Child("max"),
fmt.Sprintf("feasibleSpace.max or feasibleSpace.min must be specified for parameterType: %v", param.ParameterType)))
}
Expand Down
Loading
Loading