Add capability to provide custom seeds to GP #502

danthedaniel · 2017-06-21T19:34:16Z

What does this PR do?

Adds a seeds parameter to TPOTBase
Allows for users to export a list named seeds in their custom config

Where should the reviewer start?

Customizing TPOT's starting population in the "Using" section of the docs.

In this function in TPOTBase:

def _setup_pop(self, seeds, config_path):

How should this PR be tested?

There are tests added to the tpot_tests.py test file.

What are the relevant issues?

#59
#296

coveralls · 2017-06-21T19:41:21Z

Coverage increased (+0.6%) to 87.636% when pulling 4ebda55 on teaearlgraycold:custom_seeds into 7f2b6a7 on rhiever:development.

weixuanfu · 2017-06-22T01:45:46Z

tpot/base.py

+        seed_individuals = [creator.Individual.from_string(x, self._pset) for x in seeds]
+        self._pop = []
+
+        # Add the same set of seeds to the population until we have population_size seeds


With a little discussions with @rhiever earlier today, we think the seeds (maybe need better name for this parameter) should only specify a small part of initial population instead of the initial population that is full of duplicated pipelines in the seeds (If seeds is a small list, then the pipeline diversity of GP will be limited in the beginning). Other pipelines in initial population rather than pipelines in seeds should be randomly generated.

population_seeds should probably be the parameter name.

coveralls · 2017-06-24T04:29:08Z

Coverage increased (+0.5%) to 87.617% when pulling 424fa81 on teaearlgraycold:custom_seeds into 7f2b6a7 on rhiever:development.

coveralls · 2017-06-24T04:32:15Z

Coverage increased (+0.5%) to 87.617% when pulling 424fa81 on teaearlgraycold:custom_seeds into 7f2b6a7 on rhiever:development.

rhiever · 2017-06-24T14:44:55Z

docs/using/index.html

+<p>TPOT allows for the initial population of pipelines to be seeded. This can be done either through the <code>population_seeds</code> parameter in the TPOT constructor, or through a <code>population_seeds</code> attribute in a custom config file.</p>
+<pre><code class="Python">population_seeds = [
+    'BernoulliNB(GaussianNB(input_matrix), BernoulliNB__alpha=0.1, BernoulliNB__fit_prior=False)',
+    'BernoulliNB(input_matrix, BernoulliNB__alpha=0.01, BernoulliNB__fit_prior=True)'


How easy would it be to take actual sklearn pipelines as input instead of the string representations? I sense that users would find it easier to seed with sklearn pipelines rather than this (slightly) weird string representation we use.

Also, we should use better example clfs in the docs: Maybe a RandomForestClassifier and a LogisticRegression?

How easy would it be to take actual sklearn pipelines as input instead of the string representations?

We don't have any code to go from sklearn pipelines to deap pipelines, only the other direction. It would be non-trivial to write, but doable given Python's complete reflection of objects.

we should use better example clfs in the docs: Maybe a RandomForestClassifier and a LogisticRegression?

Sure.

It would be nice to have a function to go from sklearn pipelines to DEAP pipelines. :-) That will be one step closer to having DEAP work directly on sklearn pipelines themselves.

Alright. I'll start working on that.

How do you propose doing that? We need to make sure that it only takes in the operators, parameters, and parameter values that are defined in the config.

Pass in the pset that's being used and check that the pipeline is valid (within the context of the pset) as you walk though it.

I'd throw a TypeError if a parameter is missing, or a parameter is used that's not specified. If an operator is used that doesn't exist in the pset I'd throw a NameError.

Sounds good.

@teaearlgraycold I think the function to go from sklearn pipeline to DEAP pipelines was not added in this PR yet. But I will start merging some PRs to dev branch today in case of tons of conflicts between PRs. Please add this function later in another PR.

rhiever · 2017-06-24T14:48:03Z

tpot/base.py

@@ -1068,7 +1081,7 @@ def _operator_count(self, individual):
        return operator_count

    def _update_pbar(self, val, resulting_score_list):
-        """Update self._pbar during pipeline evaluration
+        """Update self._pbar during pipeline evaluration.


Not your typo, but there's a typo here in "evaluation."

rhiever · 2017-06-24T14:48:17Z

tests/tpot_tests.py

+    tpot_obj = TPOTRegressor(config_dict='tests/test_config.py')
+    n_seeds = len(tpot_obj._read_config_file('tests/test_config.py').population_seeds)
+
+    assert len(tpot_obj._pop) == n_seeds


I believe the unit tests need to be updated.

There seems to be just one that broke, and only in 2.7. I'll install conda for 2.7 so I can see what went wrong.

Add capability to provide custom seeds to GP

4ebda55

weixuanfu reviewed Jun 22, 2017

View reviewed changes

teaearlgraycold added 2 commits June 24, 2017 00:20

Rename seeds parameter to population_seeds, don't dup. seeds

3e25f76

Update docs site

424fa81

rhiever reviewed Jun 24, 2017

View reviewed changes

rhiever added the enhancement label Jun 24, 2017

Fix population padding

6a114ce

weixuanfu merged commit 6a114ce into EpistasisLab:development Jul 17, 2017

weixuanfu mentioned this pull request Aug 19, 2020

TPOT optimization gets stuck #1107

Open

t-harden mentioned this pull request Sep 13, 2023

Potential New Feature: allowing users to input customized initial pipelines #1321

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add capability to provide custom seeds to GP #502

Add capability to provide custom seeds to GP #502

danthedaniel commented Jun 21, 2017

coveralls commented Jun 21, 2017

weixuanfu Jun 22, 2017 •

edited

Loading

rhiever Jun 22, 2017

coveralls commented Jun 24, 2017

coveralls commented Jun 24, 2017

rhiever Jun 24, 2017

rhiever Jun 24, 2017

danthedaniel Jun 24, 2017 •

edited

Loading

rhiever Jun 27, 2017

danthedaniel Jun 27, 2017

rhiever Jun 27, 2017 •

edited

Loading

danthedaniel Jun 27, 2017 •

edited

Loading

rhiever Jun 28, 2017

weixuanfu Jul 17, 2017

rhiever Jun 24, 2017

rhiever Jun 24, 2017

danthedaniel Jun 24, 2017

Add capability to provide custom seeds to GP #502

Add capability to provide custom seeds to GP #502

Conversation

danthedaniel commented Jun 21, 2017

What does this PR do?

Where should the reviewer start?

How should this PR be tested?

What are the relevant issues?

coveralls commented Jun 21, 2017

weixuanfu Jun 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Jun 24, 2017

coveralls commented Jun 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danthedaniel Jun 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhiever Jun 27, 2017 • edited Loading

Choose a reason for hiding this comment

danthedaniel Jun 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weixuanfu Jun 22, 2017 •

edited

Loading

danthedaniel Jun 24, 2017 •

edited

Loading

rhiever Jun 27, 2017 •

edited

Loading

danthedaniel Jun 27, 2017 •

edited

Loading