Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeneticSearchCV #721

Open
wsperat opened this issue Nov 28, 2024 · 4 comments
Open

GeneticSearchCV #721

wsperat opened this issue Nov 28, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@wsperat
Copy link

wsperat commented Nov 28, 2024

A hyperparameter optimization object that uses a genetic algorithm under the hood, with the same interface as GridSearchCV and RandomizedSearchCV.

  • Regarding its use-case, this object would be conceptually very similar to the current optimizers present in sklearn. However, compared to a regular Grid Search, a genetic algorithm would not make as many redundant choices; furthermore, compared to both Grid and Random searches, an optimization using genetic algorithms would spend a larger proportion of time near local maxima/minima, thus helping optimize resources.
@wsperat wsperat added the enhancement New feature or request label Nov 28, 2024
@koaning
Copy link
Owner

koaning commented Nov 28, 2024

furthermore, compared to both Grid and Random searches, an optimization using genetic algorithms would spend a larger proportion of time near local maxima/minima, thus helping optimize resources.

While I appreciate the motivation, it is not clear to me that a genetic algorithm is immune to local optima, in which case it might still be outperformed by a random search.

I am not wholly closed to the idea of a genetic search, but there are a lot of ways to go about this. Suppose that we have two candiates and that the chromosomes represent the hyperparameters ... how would you go about combining different string settings?

@wsperat
Copy link
Author

wsperat commented Nov 28, 2024

@koaning absolutely, it's not immune to local optima, I meant to say that it should to spend more time near local optima than a random search, which will sample more or less homogeneously, thus (theoretically...) using compute resources more efficiently. Of course it's not a magic bullet and, on average, won't outperform Random Search (I don't think anything does, really), I just thought it'd be a cool option to have.

Regarding the methodology itself, my idea is to represent each hyperparameter as a gene, and a particular combination as a chromosome. As you say, string hyperparameters need some consideration; off the top of my head I can think of three alternatives:

  • Just choose them randomly on each iteration from all available alleles (options).
  • On the first iteration, generate at least one individual for each string value (for all available hyperparameters), and only apply crossing over (no point mutations), for these values (i.e. descendants will only get to choose from their parents' alleles).
  • Combine both approaches.

I've used the combination approach for some optimization problems and it has worked well enough without introducing too much complexity, but am open to other possibilities.

@koaning
Copy link
Owner

koaning commented Nov 29, 2024

Is there a reason not to use this library?

https://github.com/rsteca/sklearn-deap

Kind of feel like it has been implemented elsewhere already.

@wsperat
Copy link
Author

wsperat commented Dec 1, 2024

In my (and this can be only my) experience, both sklearn-deap and sklearn-genetic work well enough, but tend not to be worth the dependency hell that having deap as a dependency implies (though again, it might be my experience alone). My idea was to have a python-pure, albeit simpler GA optimization object.
In any case, you may be right in that it's not necessarily worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants