Skip to content
This repository has been archived by the owner on Dec 6, 2023. It is now read-only.

Unexplained behaviour of Stopping condition 0: Reached maximum number of terms #180

Open
nikrepp opened this issue May 18, 2018 · 4 comments

Comments

@nikrepp
Copy link

nikrepp commented May 18, 2018

Hello, colleagues,

I have the following problem: using PyEarth for classification task on dataset with 300000 rows and more than 500 features, I set max_terms to sufficiently high number (i.e. 100). But after two iterations everything stopped and Stopping condition 0: Reached maximum number of terms appears.

import numpy
from pyearth import Earth
from sklearn.linear_model import ElasticNet
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

model = Pipeline([('earth',Earth(max_degree=4,max_terms=100, verbose=True, enable_pruning=False)),
('enet',ElasticNet(l1_ratio=0.0,alpha=1.0))])

X_t = StandardScaler().fit_transform(X_t)
model.fit(X_t, Y_t*100)

Beginning forward pass

iter parent var knot mse terms gcv rsq grsq

0 - - - 34.148441 1 34.149 0.000 0.000
1 0 180 114453 34.135289 3 34.137 0.000 0.000

Stopping Condition 0: Reached maximum number of terms

May be I am just doing something wrong or whatever?
From metrics I got I can see that model is pretty robust, but underfitted.

Nikita

@jcrudy
Copy link
Collaborator

jcrudy commented May 20, 2018

@nikrepp I don't see any obvious problems with what you're doing. That seems like a pretty severe issue, though, so I'm surprised to be seeing it for the first time now. Here are a few questions that might help me:

  1. Is the code you included above the complete program that produces the error?
  2. Does the issue seem to depend on your data set, or does it happen with any data you use?
  3. Can you tell me what your operating system, python version, numpy, scipy, and scikit-learn versions are?
  4. How did you install py-earth, and what is pyearth.__version__?

@nikrepp
Copy link
Author

nikrepp commented May 22, 2018

Hello Jason,

see answers for your questions.

  1. Complete program is here. Target is very low (0.0035).

import pandas as pd
import numpy as np
##Read target
dataset = pd.read_csv('....csv', sep=',', encoding='cp1251')
dataset = dataset.head(10000)

y = dataset[u'Флаг рефинансирования']
X = dataset.drop(dataset.columns[[0,1,2,3,6]], axis=1)

import pyearth
import scipy
import sklearn
import numpy
print(pyearth.version)
print(numpy.version)
print(scipy.version)
print(sklearn.version)

import numpy
from pyearth.earth import Earth
from sklearn.linear_model import ElasticNet
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

model = Pipeline([('earth',Earth(max_degree=4,max_terms=10, minspan_alpha=10, verbose=True, enable_pruning=False)),
('enet',ElasticNet(l1_ratio=0.0,alpha=1.0))])

X = StandardScaler().fit_transform(X)
model.fit(X, y)

Beginning forward pass

iter parent var knot mse terms gcv rsq grsq

0 - - - 0.002394 1 0.002 0.000 0.000
1 0 304 5228 0.002354 3 0.002 0.017 0.016
2 1 344 7108 0.002295 5 0.002 0.042 0.040
3 2 160 3478 0.002273 7 0.002 0.051 0.048
4 5 573 1411 0.002195 9 0.002 0.083 0.080
5 6 450 4536 0.002195 11 0.002 0.083 0.079

Stopping Condition 0: Reached maximum number of terms

C:\Users\I304909\AppData\Local\Continuum\Miniconda2\envs\tensorflow\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
ConvergenceWarning)

Out[4]:

Pipeline(memory=None,
steps=[('earth', Earth(allow_linear=None, allow_missing=False, check_every=None,
enable_pruning=False, endspan=None, endspan_alpha=None, fast_K=None,
fast_h=None, feature_importance_type=None, max_degree=4, max_terms=10,
min_search_points=None, minspan=None, minspan_alpha=10, penalty=None,
...alse, precompute=False,
random_state=None, selection='cyclic', tol=0.0001, warm_start=False))])

  1. I've tested it on Adult dataset from UCI, it works!

import pandas as pd
import numpy as np
##Read target
dataset = pd.read_csv('C:/.../Census01.csv', sep=';', encoding='utf8')
dataset = dataset

for i in dataset.columns:
dataset[i] = dataset[i].factorize()[0].astype(np.int32)

y=dataset['age']
X = dataset.drop(dataset.columns[[0]], axis=1)
model2 = Pipeline([('earth',Earth(max_degree=4,max_terms=10, verbose=True, enable_pruning=False)),
('enet',ElasticNet(l1_ratio=0.0,alpha=1.0))])

X = StandardScaler().fit_transform(dataset)
model2.fit(X, y)

Beginning forward pass

iter parent var knot mse terms gcv rsq grsq

0 - - - 241.716883 1 241.727 0.000 0.000
1 0 4 -1 238.942474 2 238.977 0.011 0.011
2 1 4 -1 235.893861 3 235.952 0.024 0.024
3 0 1 -1 234.005053 4 234.087 0.032 0.032
4 1 6 -1 232.915885 5 233.021 0.036 0.036
5 0 11 -1 231.898621 6 232.027 0.041 0.040
6 0 9 19353 231.112850 8 231.288 0.044 0.043
7 0 0 -1 230.395323 9 230.594 0.047 0.046
8 8 5 -1 229.583339 10 229.804 0.050 0.049
9 0 2 -1 229.275825 11 229.520 0.051 0.050

Stopping Condition 0: Reached maximum number of terms

  1. Windows 10, python: I've tested 2.7 and 3 (the same behavior).
    PyEarth, Numpy, Scipy, Sklearn:
    0.1.0
    1.13.3
    1.0.0
    0.19.1

  2. I tried different ways, last way through Conda, first - building from source (the same behavior).

Thanks! I also very interested what is that.

@jcrudy
Copy link
Collaborator

jcrudy commented May 22, 2018

@nikrepp Thanks for all the info. In the code you pasted above, you set max_terms to 10, and the forward pass terminated after 5 iterations. That is expected behavior as each iteration produces 2 terms (assuming it finds a knot that is superior to the linear term). Is that the problem you are observing, or is there other worse behavior you're seeing? The reason it goes to iteration 9 on the UCI data set is that it is picking linear basis functions (knot = -1), which only add one term each.

@nikrepp
Copy link
Author

nikrepp commented May 24, 2018

Hello Jason,

fortunately, I can not reproduce weird behaviour anymore, so I prefer thinking it was corrupted install from sources under Python2 on Windows.

Thank you for all the details. I am looking forward for development of this framework for classification problems objectives, better support for categorical predictors and interpretation of fitted relationships.

Thanks!

P.S. You can give me a pleasure with a possibility to contribute in one of this topics.

Regards,
Nikita

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants