-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question FunctionTransformer(copy) #581
Comments
The |
Nice, a feedback classifier. Is it somewhere mentioned in the sklearn docs, I couldn't find anything about this? |
I don't think this is mentioned in the sklearn docs. We implemented this feature ourselves within the existing sklearn pipeline framework. |
This can lead to weird pipelines though. Here is what I got # Score on the training set was:0.333968253968
exported_pipeline = make_pipeline(
make_union(
FunctionTransformer(copy),
FunctionTransformer(copy)
),
RandomForestClassifier(bootstrap="false", criterion="gini", max_features=0.15, min_samples_leaf=10, min_samples_split=4, n_estimators=100)
) In this case I doubt that the Context:
|
That's interesting. How long did you run TPOT (population & generations) when it gave you this solution? |
I since deleted this example but I got another one: # Score on the training set was:0.522222222222
exported_pipeline = make_pipeline(
make_union(
FunctionTransformer(copy),
FunctionTransformer(copy)
),
StandardScaler(),
MaxAbsScaler(),
StackingEstimator(estimator=LinearSVC(C=10.0, dual=False, loss="squared_hinge", penalty="l2", tol=0.01)),
RandomForestClassifier(bootstrap=True, criterion="gini", max_features=0.5, min_samples_leaf=8, min_samples_split=18, n_estimators=100)
) Here is the tpot classifier that I configured: model = tpot.TPOTClassifier(
cv=LeaveOneGroupOut(),
scoring=experiment.build_scorer(),
periodic_checkpoint_folder=files.create_abspath('models/multi_pca_usine_lcdv'),
max_time_mins=11 * 60,
max_eval_time_mins=10,
n_jobs=10,
verbosity=2
) So population is 100. Not sure about the number of generations at this point.. I guess at least 5 since there are 5 exported pipelines in the output folder before this one. The optimizer ran for ~6 hours before reaching this intermediate result (better pipelines obtained later in the same run did not contain such artifacts). |
Ah, 5 generations isn't very much time for TPOT to really refine the pipelines - at that point the GA has only gone through 5 rounds of selection. That's good to hear that pipelines from later in the run didn't retain this artifact. The reason why TPOT doesn't immediately get rid of pipelines like this is because this artifact is potentially useful for building more complex pipelines later in the optimization process. Either of those We've discussed other approaches to pipeline regularization (#207) that would probably weed out pipelines like you showed above, bu we haven't gotten to implementing those ideas yet. |
Interesting, thank you for the explanation. Overall I found TPOT to be very useful, well done! |
In my best estimator I see a FunctionTransformer(copy). Is it useful? It just seems to copy the input to the output.
The text was updated successfully, but these errors were encountered: