Reproducibility of the export pipeline #1270

Iris7788 · 2022-09-18T15:51:55Z

Context of the issue

I used tpot to fit my dataset, I got the different export pipeline for each run.

Process to reproduce the issue

The steps for generating exported pipeline, the shape of my dataset was (45, 478).

X_train, X_test, y_train, y_test = \
sklearn.model_selection.train_test_split(X, y, random_state=1,test_size = 0.15)
M1 = TPOTRegressor(generations=10, population_size=40, verbosity=2, random_state=42,n_jobs =-1,cv=5)
M1.fit(X_train, y_train)
M1.export('M1_pipeline.py')

Current result

When I firstly ran, the export pipeline was DecisionTreeRegressor

Generation 1 - Current best internal CV score: -0.6631261058133652
Generation 2 - Current best internal CV score: -0.6631261058133652
Generation 3 - Current best internal CV score: -0.6442071896861652
Generation 4 - Current best internal CV score: -0.5726875496699182
Generation 5 - Current best internal CV score: -0.5726875496699182
Generation 6 - Current best internal CV score: -0.528473933017039
Generation 7 - Current best internal CV score: -0.528473933017039
Generation 8 - Current best internal CV score: -0.528473933017039
Generation 9 - Current best internal CV score: -0.528473933017039
Generation 10 - Current best internal CV score: -0.528473933017039

Best pipeline: DecisionTreeRegressor(Normalizer(input_matrix, norm=max), max_depth=3, min_samples_leaf=10, min_samples_split=9)

When I secondly ran, the export pipeline was ExtraTreesRegressor

Generation 1 - Current best internal CV score: -0.6631261058133652
Generation 2 - Current best internal CV score: -0.6631261058133652
Generation 3 - Current best internal CV score: -0.6593793694494272
Generation 4 - Current best internal CV score: -0.6524528603774085
Generation 5 - Current best internal CV score: -0.636417747633282
Generation 6 - Current best internal CV score: -0.633586381252542
Generation 7 - Current best internal CV score: -0.633586381252542
Generation 8 - Current best internal CV score: -0.633586381252542
Generation 9 - Current best internal CV score: -0.633586381252542
Generation 10 - Current best internal CV score: -0.633586381252542

Best pipeline: ExtraTreesRegressor(LinearSVR(input_matrix, C=1.0, dual=True, epsilon=0.01, loss=epsilon_insensitive, tol=1e-05), bootstrap=False, max_features=0.3, min_samples_leaf=6, min_samples_split=13, n_estimators=100)

Expected result

I would like to have a repeatable and stable export pipeline. The environment version I am using is Python 3.7.12, TPOT 0.11.7.

Thank you very much for the development and maintenance of TPOT.

The text was updated successfully, but these errors were encountered:

perib · 2022-09-29T17:04:08Z

If you set n_jobs to 1, reproducibility is more likely. When using parallel processes, exact reproducibility gets challenging since the order of execution has some randomness that is not controllable. It is something we are thinking about

Iris7788 · 2022-09-29T17:04:29Z

你的邮件我已经收到啦，我会尽快查收哒~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility of the export pipeline #1270

Reproducibility of the export pipeline #1270

Iris7788 commented Sep 18, 2022

perib commented Sep 29, 2022

Iris7788 commented Sep 29, 2022 via email

Reproducibility of the export pipeline #1270

Reproducibility of the export pipeline #1270

Comments

Iris7788 commented Sep 18, 2022

Context of the issue

Process to reproduce the issue

Current result

Expected result

perib commented Sep 29, 2022

Iris7788 commented Sep 29, 2022 via email