Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add discrete variables #67

Open
gpfins opened this issue Aug 13, 2017 · 4 comments
Open

Add discrete variables #67

gpfins opened this issue Aug 13, 2017 · 4 comments
Assignees
Milestone

Comments

@gpfins
Copy link
Contributor

gpfins commented Aug 13, 2017

A next (important imho) step would be to support ordinal variables. It would be useful to have a discussion here on the design of this.

A few issues would be:

  1. The introduction of OrdinalParameter for ordinal discrete values. There would be two types of ordinal parameters, namely one with only bounds a, b ({a, a+1, ..., b-1}) and one with specific (ordered) discrete variables ({a, b, c, d}). Do we treat those the same and leave it up to the user to define the first case as range(a,b) ?

  2. The optimization of the acquisition function. In the case of pure discrete values it would probably be the easiest (and reasonably fast) to simply evaluate all possible combinations. In the case of a domain consisting of both continuous and discrete parameters: we could apply the naive approach of optimizing the acquisition function as function of the continuous parameters for every combination of the discrete parameters. In both cases this doesn't use any gradient information for the discrete parameters. One idea could be for example to treat the acquisition function as if its inputs were all continuous, but add penalizers in order not to sample at the same discrete variable 2 times.

@javdrher
Copy link
Member

  1. I would go for the same, for convenience we can derive OrdinalRangeParameter from the standard OrdinalRange to make it a bit more convenient. In both cases, the resulting domain is a grid anyway.

  2. After defining new parameters, I'd focus on representing the domains accurately. Sums of discrete parameters create grids, continuous parameters hypercubes, and combines they become something joint with both a discrete as well as a continuous domain. Then we have to make the optimizers a little picky so they reject domains they don't like. The penalization approach sounds nice but might be a lot harder to achieve design-wise and I am not sure we'd gain that much. The acquisition is cheap-to-evaluate.

From this I derive following requirements:

  • Being able to split up a domain again by selecting a number of parameters and gaining the properties of a more specific domain again (e.g., all continuous -> hypercube)
  • Quickly obtain a wrapper for the objective function which disables parameters which weren't selected and plugs in the values for those parameters. This could also be achieved by introducing a .fixed syntax like in GPflow.

@icouckuy
Copy link
Contributor

  1. I also would go for one OrdinalRangeParameter to hold both a range(7) or arbitrary list of integers. Maybe we should call it IntegerParameter?

Being able to split up a domain again by selecting a number of parameters and gaining the properties of > a more specific domain again (e.g., all continuous -> hypercube)

With the new domain indexing it is possible to easily extract a new domain only holding a subset of the parameters. At the moment there is no way to automatically choose only a certain type of params as well as the new domain will be a generic domain class and not a 'Hypercube'.

@javdrher
Copy link
Member

I'd like to comply with the naming in http://proceedings.mlr.press/v70/valera17a/valera17a.pdf

@javdrher
Copy link
Member

We haven given this some thought so far. With regards to the optimization of a discrete domain there are currently two approaches we can follow:

  • Evaluate the discrete part on a (sub)grid and start a gradient-descent on the continuous part of the domain. This option has the disadvantage of being potentially computationally extremely demanding. e.g., optimizing 3 layers of a neural network between 20 and 200 neurons will result in serious optimization effort. We could only evaluate on a sub-grid but I'm in doubt that solution will scale appropriately.
  • Round off continuous values (used in spearmint) and complement with https://arxiv.org/abs/1706.03673 . This approach would also allow to keep using gradients and traditional optimizers. There's some concern on selecting the same point multiple times though.

@javdrher javdrher added this to the 0.2.0 release milestone Aug 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants