Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate Learner2D #56

Open
basnijholt opened this issue Dec 19, 2018 · 9 comments
Open

deprecate Learner2D #56

basnijholt opened this issue Dec 19, 2018 · 9 comments

Comments

@basnijholt
Copy link
Member

(original issue on GitLab)

opened by Joseph Weston (@jbweston) at 2018-07-09T14:51:40.135Z

Once LearnerND becomes good enough we should remove Learner2D, as it will no longer be needed.

@basnijholt
Copy link
Member Author

originally posted by Bas Nijholt (@basnijholt) at 2018-07-09T16:40:11.612Z on GitLab

If good enough == better than.

@basnijholt
Copy link
Member Author

originally posted by Joseph Weston (@jbweston) at 2018-07-09T16:57:09.452Z on GitLab

good enough == as good as?

@basnijholt
Copy link
Member Author

originally posted by Jorn Hoofwijk (@Jorn) at 2018-07-13T07:59:33.461Z on GitLab

I think good enough == better than the Learner2d.

  • Because right now the learner2d takes all the points into account when computing the loss for each triangle. Although this is something we want to lose, this has some benefits, as you can for example, take the second derivative of the function into account (or as the default loss is currently implemented, taking the deviation from the linear interpolation + gradient)
  • Also I am not sure if people have written custom loss functions, which are going to be different for the LearnerND, so it is not a drop-in replacement.

@basnijholt
Copy link
Member Author

originally posted by Joseph Weston (@jbweston) at 2018-07-13T08:12:03.437Z on GitLab

Point gitlab:#1 is fair.

We don't care about backwards compatibility at this point, as we are < v1.0

@basnijholt
Copy link
Member Author

originally posted by Jorn Hoofwijk (@Jorn) at 2018-07-13T08:22:29.208Z on GitLab

I take it you didn't want to refer to issue 1?

In the future I would like to implement a way to maintain locality and still be able to take the second derivative into account (like taking the loss over a simplex and it's neighbours or something like that). But to me this sounds like a non-trivial task.
However, once implemented, it could really improve the quality of the LearnerND

@atom-moyer
Copy link

Hi,

Recently, I have been developing a custom loss function that does take into account all points, which I was planning to generalize to the LearnerND. However, I recently realized that the LearnerND only takes into account a single simplex or a simplex and neighbors, which really bummed me out. That's how I got to this issue.

Why do you plan to remove the ability to calculate loss with the context of all of the simplexes? I feel like 1) that removes the ability to write a vectorized loss function and 2) there are plenty of loss functions that may want to consider global properties like maximum and minimum value.

For example, I wrote a boltzmann loss for the 2D case. See below:

def boltzmann_loss(ip, kt=0.59):
    """Loss function that combines default loss and `boltzmann probabilities`.

    Applies higher loss to lower values.

    Works with `~adaptive.Learner2D` only.

    Parameters
    ----------
    ip : `scipy.interpolate.LinearNDInterpolator` instance

    Returns
    -------
    losses : numpy.ndarray
        Loss per triangle in ``ip.tri``.
    """
    vs = np.squeeze(np.min(ip.values[ip.tri.vertices], axis=1))
    zs = areas(ip) * np.exp((vs.min() - vs) / kt)

    bps = zs / np.sum(zs)
    bps /= np.median(bps) or 1.0

    return bps * default_loss(ip)

It would be great if we could generalize this to N-Dimensions. It is amazing for molecular conformer/ensemble sampling, which is what I am developing it for.

Adam

@basnijholt
Copy link
Member Author

basnijholt commented Dec 21, 2019

Hi Adam,

Personally I don't think we should deprecate the Learner2D either.

I thought that we could have 2 LearnerNDs, one that works on a global level (the ND version of the Learner2D) and the LearnerND as is.

Actually, there isn't much in the Learner2D that is really tight to 2D, just the loss-function AFAIR.

Of course, there might be some things like the bounds, but those are trivial things to change.

If you have the energy to generalize the Learner2D to ND, I will be more than willing to accept the PR 😄

We should probably think about renaming them though, to distinguish the global and local losses.

@akhmerov
Copy link
Contributor

Hey @atom-moyer, your Boltzmann loss function looks really interesting and useful for minimization.
I don't immediately understand though why you'd need global properties at all: it appear that just multiplying the loss by the unnormalized Boltzmann factor should have the exact desired effect. I actually suspect that right now your loss doesn't work as intended because AFAIR we don't recompute the loss of all the intervals at each ask/tell cycle, and only do select few.

Can you give the unnormalized Boltzmann loss a shot? If it works it would be a really useful addition to the collection of the loss functions that we have.

One may also consider dynamically adjusting kt depending on the range of values that the learner has encountered as a global update; I'll address the question of global updates below.

@akhmerov
Copy link
Contributor

The reason why we consider making loss only depend on the local properties is the scaling of performance. If one only needs to update the loss of 𝓞(1) candidate locations, resulting in an overall learner overhead of 𝓞(N log N) for evaluating N points.

Right now the overhead of local operations in LearnerND is high enough for the Learner2D to match its performance, but I expect that in future LearnerND will overtake Learner2D. Already now there's a PR (#243) removing some of the overhead.

At the same time, 𝓞(N log N) allows us to make amortized global updates. There we keep track of some aggregate of the learner data, and recompute all losses when we detect that this aggregate quantity changes by a sufficiently large amount. This is how we take into account the change of scale in the 1D learner: whenever the y-range grows by a factor 2, we recompute all interval losses.

Such amortized updates would allow to e.g. define kt as a finite fraction of the total function range. This would then improve the explore/exploit tradeoff.

We currently don't have a general interface for global amortized data updates, and AFAIR this was one of the open questions that arose in discussions of #220.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants