Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Bug fix for first_metric_only on earlystopping. #2209

Merged
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
0c9c77c
Bug fix for first_metric_only if the first metric is train metric.
matsuken92 Jun 1, 2019
130fe38
Update bug fix for feval issue.
matsuken92 Jun 1, 2019
7ab1a59
Disable feval for first_metric_only.
matsuken92 Jun 1, 2019
ba8a5aa
Additional test items.
matsuken92 Jun 1, 2019
25850fa
Fix wrong assertEqual settings & formating.
matsuken92 Jun 1, 2019
fddf8da
Change dataset of test.
matsuken92 Jun 1, 2019
6b71ebc
Fix random seed for test.
matsuken92 Jun 1, 2019
979f4df
Modiry assumed test result due to different sklearn verion between CI…
matsuken92 Jun 1, 2019
5e68ae9
Remove f-string
matsuken92 Jun 1, 2019
0f196e2
Applying variable assumed test result for test.
matsuken92 Jun 1, 2019
c0d61fa
Fix flake8 error.
matsuken92 Jun 1, 2019
0e91956
Modifying in accordance with review comments.
matsuken92 Jun 1, 2019
6f30b81
Modifying for pylint.
matsuken92 Jun 1, 2019
3770cc1
simplified tests
StrikerRUS Jun 2, 2019
c4e0af5
Deleting error criteria `if eval_metric is None`.
matsuken92 Jun 3, 2019
2f4e2b0
Delete test items of classification.
matsuken92 Jun 3, 2019
9387197
Simplifying if condition.
matsuken92 Jun 7, 2019
5aeb2bd
Applying first_metric_only for sklearn wrapper.
matsuken92 Jun 10, 2019
79ba017
Merge branch 'master' into bugfix/first_metric_only_train_metric
matsuken92 Jun 10, 2019
c40408c
Modifying test_sklearn for comforming to python 2.x
matsuken92 Jun 10, 2019
6a70b0c
Merge branch 'bugfix/first_metric_only_train_metric' of https://githu…
matsuken92 Jun 10, 2019
fe7d586
Fix flake8 error.
matsuken92 Jun 10, 2019
71c1bc2
Additional fix for sklearn and add tests.
matsuken92 Jun 11, 2019
3e956ea
Bug fix and add test cases.
matsuken92 Jun 17, 2019
0338bc7
some refactor
StrikerRUS Jun 18, 2019
75d7c57
fixed lint
StrikerRUS Jun 18, 2019
4645126
fixed lint
StrikerRUS Jun 18, 2019
60233bb
Fix duplicated metrics scores to pass the test.
matsuken92 Jun 29, 2019
f3f1e83
Fix the case first_metric_only not in params.
matsuken92 Jun 29, 2019
e054f97
Converting metrics aliases.
matsuken92 Jul 2, 2019
4e62ef7
Add comment.
matsuken92 Jul 3, 2019
3b154b6
Modify comment for pylint.
matsuken92 Jul 3, 2019
6dc7e85
Modify comment for pydocstyle.
matsuken92 Jul 3, 2019
1dc5397
Using split test set for two eval_set.
matsuken92 Jul 6, 2019
ebc97b3
added test case for metric aliases and length checks
StrikerRUS Jul 6, 2019
f7f0dfe
minor style fixes
StrikerRUS Jul 6, 2019
4221b8a
fixed rmse name and alias position
StrikerRUS Jul 7, 2019
5470265
Fix the case metric=[]
matsuken92 Jul 10, 2019
e292b39
Fix using env.model._train_data_name
matsuken92 Jul 10, 2019
ffa95b8
Fix wrong test condition.
matsuken92 Jul 10, 2019
3403f7b
Move initial process to _init() func.
matsuken92 Jul 10, 2019
43ea2df
Merge remote-tracking branch 'upstream/master' into bugfix/first_metr…
matsuken92 Jul 27, 2019
b509afa
Modify test setting for test_sklearn & training data matching on call…
matsuken92 Jul 27, 2019
c4e4b33
Support composite name metrics.
matsuken92 Jul 27, 2019
2f1578c
Remove metric check process & reduce redundant test cases.
matsuken92 Jul 27, 2019
45fc5eb
Revised according to the matters pointed out on a review.
matsuken92 Aug 29, 2019
f270258
increased code readability
StrikerRUS Aug 30, 2019
df03f4c
Fix the issue of order of validation set.
matsuken92 Sep 1, 2019
37770b0
Merge branch 'bugfix/first_metric_only_train_metric' of https://githu…
matsuken92 Sep 1, 2019
0a73a67
Changing to OrderdDict from default dict for score result.
matsuken92 Sep 1, 2019
6209c4a
added missed check in cv function for first_metric_only and feval co-…
StrikerRUS Sep 1, 2019
ea0312a
keep order only for metrics but not for datasets in best_score
StrikerRUS Sep 1, 2019
c881171
move OrderedDict initialization to init phase
StrikerRUS Sep 1, 2019
386fe1c
fixed minor printing issues
StrikerRUS Sep 1, 2019
8fe0469
move first metric detection to init phase and split can be performed …
StrikerRUS Sep 1, 2019
13737ac
split only once during callback
StrikerRUS Sep 1, 2019
5128b34
removed excess code
StrikerRUS Sep 1, 2019
ca4fd0c
fixed typo in variable name and squashed ifs
StrikerRUS Sep 1, 2019
cb9e327
use setdefault
StrikerRUS Sep 1, 2019
97004e0
hotfix
StrikerRUS Sep 1, 2019
19319c3
fixed failing test
StrikerRUS Sep 1, 2019
58a800c
refined tests
StrikerRUS Sep 2, 2019
f5a2b74
refined sklearn test
StrikerRUS Sep 3, 2019
b7a03e7
Making "feval" effective on early stopping.
matsuken92 Sep 9, 2019
47b7a23
Merge branch 'master' into bugfix/first_metric_only_train_metric
matsuken92 Sep 9, 2019
5c99e7e
fixed conflicts
StrikerRUS Sep 10, 2019
15a5fc2
allow feval and first_metric_only for cv
StrikerRUS Sep 10, 2019
d20b338
removed unused code
StrikerRUS Sep 10, 2019
c3fbf6b
added tests for feval
StrikerRUS Sep 10, 2019
cbaadbe
fixed printing
StrikerRUS Sep 10, 2019
a2d6449
add note about whitespaces in feval name
StrikerRUS Sep 10, 2019
88050da
Modifying final iteration process in case valid set is training data.
matsuken92 Sep 11, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 23 additions & 2 deletions python-package/lightgbm/callback.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from operator import gt, lt

from .compat import range_
from .basic import LightGBMError


class EarlyStopException(Exception):
Expand Down Expand Up @@ -214,7 +215,24 @@ def _callback(env):
_init(env)
if not enabled[0]:
return
if first_metric_only:
StrikerRUS marked this conversation as resolved.
Show resolved Hide resolved
eval_metric = None
for metric_alias in ['metric', 'metrics', 'metric_types']:
if metric_alias in env.params.keys():
if isinstance(env.params[metric_alias], (tuple, list)):
eval_metric = env.params[metric_alias][0]
else:
eval_metric = env.params[metric_alias]
break
if eval_metric is None:
raise LightGBMError("`metric` should be specified if first_metric_only==True.")
StrikerRUS marked this conversation as resolved.
Show resolved Hide resolved
for i in range_(len(env.evaluation_result_list)):
metric_key = env.evaluation_result_list[i][1]
StrikerRUS marked this conversation as resolved.
Show resolved Hide resolved
if metric_key.split(" ")[0] == "train":
continue # train metric doesn't used on early stopping.
if first_metric_only:
if metric_key != "valid {}".format(eval_metric) and metric_key != eval_metric and eval_metric != "":
continue
score = env.evaluation_result_list[i][2]
if best_score_list[i] is None or cmp_op[i](score, best_score[i]):
best_score[i] = score
Expand All @@ -224,13 +242,16 @@ def _callback(env):
if verbose:
print('Early stopping, best iteration is:\n[%d]\t%s' % (
best_iter[i] + 1, '\t'.join([_format_eval_result(x) for x in best_score_list[i]])))
if first_metric_only:
print("Evaluating only: {}".format(metric_key))
raise EarlyStopException(best_iter[i], best_score_list[i])
if env.iteration == env.end_iteration - 1:
if verbose:
print('Did not meet early stopping. Best iteration is:\n[%d]\t%s' % (
best_iter[i] + 1, '\t'.join([_format_eval_result(x) for x in best_score_list[i]])))
if first_metric_only:
print("Evaluating only: {}".format(metric_key))
raise EarlyStopException(best_iter[i], best_score_list[i])
if first_metric_only: # the only first metric is used for early stopping
break
_callback.order = 30
_callback.first_metric_only = first_metric_only
return _callback
6 changes: 6 additions & 0 deletions python-package/lightgbm/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,8 @@ def train(params, train_set, num_boost_round=100,
callbacks = set()
else:
for i, cb in enumerate(callbacks):
if getattr(cb, 'first_metric_only', False) and feval is not None:
raise LightGBMError("`first_metric_only` and `feval` are not available at the same time.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matsuken92 Sorry, forgot to ask in my previous comment. Do we really need this limitation after cpp changes?

cb.__dict__.setdefault('order', i - len(callbacks))
callbacks = set(callbacks)

Expand All @@ -209,6 +211,8 @@ def train(params, train_set, num_boost_round=100,
callbacks.add(callback.print_evaluation(verbose_eval))

if early_stopping_rounds is not None:
if first_metric_only and feval is not None:
raise LightGBMError("`first_metric_only` and `feval` are not available at the same time.")
callbacks.add(callback.early_stopping(early_stopping_rounds, first_metric_only, verbose=bool(verbose_eval)))

if learning_rates is not None:
Expand Down Expand Up @@ -533,6 +537,8 @@ def cv(params, train_set, num_boost_round=100,
callbacks = set()
else:
for i, cb in enumerate(callbacks):
if getattr(cb, 'first_metric_only', False) and feval is not None:
raise LightGBMError("`first_metric_only` and `feval` are not available at the same time.")
cb.__dict__.setdefault('order', i - len(callbacks))
callbacks = set(callbacks)
if early_stopping_rounds is not None:
Expand Down
167 changes: 137 additions & 30 deletions tests/python_package_test/test_engine.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# coding: utf-8
# pylint: skip-file
import copy
import itertools
import math
import os
import psutil
Expand Down Expand Up @@ -1417,42 +1416,150 @@ def test_get_split_value_histogram(self):
self.assertRaises(lgb.basic.LightGBMError, gbm.get_split_value_histogram, 2)

def test_early_stopping_for_only_first_metric(self):
# regression test
X, y = load_boston(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

# test that first_metric_only and feval cannot be used together
def constant_metric(preds, train_data):
return ('constant_metric', 0.0, False)

params = {
'objective': 'regression',
'metric': 'None',
'verbose': -1
'verbose': -1,
'seed': 123,
}
with np.testing.assert_raises_regex(lgb.basic.LightGBMError,
'`first_metric_only` and `feval` are not available*'):
lgb.train(dict(params, first_metric_only=True), lgb_train,
num_boost_round=20, valid_sets=[lgb_eval],
feval=constant_metric,
early_stopping_rounds=5, verbose_eval=False)

# test various combination of metrics
def metrics_combination_train_regression(metric_list, assumed_iteration, first_metric_only):
params = {
'objective': 'regression',
'learning_rate': 0.5,
'num_leaves': 10,
'metric': metric_list,
'verbose': -1,
'seed': 123
}
gbm = lgb.train(dict(params, first_metric_only=first_metric_only), lgb_train,
num_boost_round=25, valid_sets=[lgb_eval],
early_stopping_rounds=5, verbose_eval=False)
self.assertEqual(gbm.best_iteration, assumed_iteration)

def metrics_combination_cv_regression(metric_list, assumed_iteration,
first_metric_only, eval_train_metric):
params = {
'objective': 'regression',
'learning_rate': 0.9,
'num_leaves': 10,
'metric': metric_list,
'verbose': -1,
'seed': 123,
'gpu_use_dp': True
}
ret = lgb.cv(dict(params, first_metric_only=first_metric_only),
stratified=False,
train_set=lgb_train,
num_boost_round=25,
early_stopping_rounds=5, verbose_eval=False,
eval_train_metric=eval_train_metric)
self.assertEqual(len(ret[list(ret.keys())[0]]), assumed_iteration)

best_iter_l1 = 16
best_iter_l2 = 19
best_iter_min = min([best_iter_l1, best_iter_l2])
metrics_combination_train_regression('l2', best_iter_l2, True)
metrics_combination_train_regression('l1', best_iter_l1, True)
metrics_combination_train_regression(['l2', 'l1'], best_iter_l2, True)
metrics_combination_train_regression(['l1', 'l2'], best_iter_l1, True)
metrics_combination_train_regression(['l2', 'l1'], best_iter_min, False)
metrics_combination_train_regression(['l1', 'l2'], best_iter_min, False)

best_iter_l1 = 6
best_iter_l2 = 11
best_iter_min = min([best_iter_l1, best_iter_l2])
metrics_combination_cv_regression('l2', best_iter_l2, True, False)
metrics_combination_cv_regression('l1', best_iter_l1, True, False)
metrics_combination_cv_regression(['l2', 'l1'], best_iter_l2, True, False)
metrics_combination_cv_regression(['l1', 'l2'], best_iter_l1, True, False)
metrics_combination_cv_regression(['l2', 'l1'], best_iter_min, False, False)
metrics_combination_cv_regression(['l1', 'l2'], best_iter_min, False, False)
metrics_combination_cv_regression('l2', best_iter_l2, True, True)
metrics_combination_cv_regression('l1', best_iter_l1, True, True)
metrics_combination_cv_regression(['l2', 'l1'], best_iter_l2, True, True)
metrics_combination_cv_regression(['l1', 'l2'], best_iter_l1, True, True)
metrics_combination_cv_regression(['l2', 'l1'], best_iter_min, False, True)
metrics_combination_cv_regression(['l1', 'l2'], best_iter_min, False, True)

# classification test
matsuken92 marked this conversation as resolved.
Show resolved Hide resolved
X, y = load_breast_cancer(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

decreasing_generator = itertools.count(0, -1)

def decreasing_metric(preds, train_data):
return ('decreasing_metric', next(decreasing_generator), False)

def constant_metric(preds, train_data):
return ('constant_metric', 0.0, False)
# test various combination of metrics
def metrics_combination_train_classification(metric_list, assumed_iteration, first_metric_only):
params = {
'objective': 'binary',
'learning_rate': 0.5,
'num_leaves': 5,
'metric': metric_list,
'verbose': -1,
'seed': 123
}
gbm = lgb.train(dict(params, first_metric_only=first_metric_only), lgb_train,
num_boost_round=25, valid_sets=[lgb_eval],
early_stopping_rounds=5, verbose_eval=False)
self.assertEqual(gbm.best_iteration, assumed_iteration)

# test that all metrics are checked (default behaviour)
gbm = lgb.train(params, lgb_train, num_boost_round=20, valid_sets=[lgb_eval],
feval=lambda preds, train_data: [decreasing_metric(preds, train_data),
constant_metric(preds, train_data)],
early_stopping_rounds=5, verbose_eval=False)
self.assertEqual(gbm.best_iteration, 1)

# test that only the first metric is checked
gbm = lgb.train(dict(params, first_metric_only=True), lgb_train,
num_boost_round=20, valid_sets=[lgb_eval],
feval=lambda preds, train_data: [decreasing_metric(preds, train_data),
constant_metric(preds, train_data)],
early_stopping_rounds=5, verbose_eval=False)
self.assertEqual(gbm.best_iteration, 20)
# ... change the order of metrics
gbm = lgb.train(dict(params, first_metric_only=True), lgb_train,
num_boost_round=20, valid_sets=[lgb_eval],
feval=lambda preds, train_data: [constant_metric(preds, train_data),
decreasing_metric(preds, train_data)],
early_stopping_rounds=5, verbose_eval=False)
self.assertEqual(gbm.best_iteration, 1)
def metrics_combination_cv_classification(metric_list, assumed_iteration,
first_metric_only, eval_train_metric):
params = {
'objective': 'binary',
'learning_rate': 0.4,
'num_leaves': 5,
'metric': metric_list,
'verbose': -1,
'seed': 123,
'gpu_use_dp': True
}
ret = lgb.cv(dict(params, first_metric_only=first_metric_only),
train_set=lgb_train, num_boost_round=25,
nfold=3, stratified=False, shuffle=False,
early_stopping_rounds=5, verbose_eval=False,
eval_train_metric=eval_train_metric)
self.assertEqual(len(ret[list(ret.keys())[0]]), assumed_iteration)

best_iter_logloss = 7
best_iter_auc = 3
best_iter_min = min([best_iter_logloss, best_iter_auc])
metrics_combination_train_classification('binary_logloss', best_iter_logloss, True)
metrics_combination_train_classification('auc', best_iter_auc, True)
metrics_combination_train_classification(['binary_logloss', 'auc'], best_iter_logloss, True)
metrics_combination_train_classification(['auc', 'binary_logloss'], best_iter_auc, True)
metrics_combination_train_classification(['binary_logloss', 'auc'], best_iter_min, False)
metrics_combination_train_classification(['auc', 'binary_logloss'], best_iter_min, False)

best_iter_logloss = 15
best_iter_error = 6
best_iter_min = min([best_iter_logloss, best_iter_error])
metrics_combination_cv_classification('binary_logloss', best_iter_logloss, True, False)
metrics_combination_cv_classification('binary_error', best_iter_error, True, False)
metrics_combination_cv_classification(['binary_logloss', 'binary_error'], best_iter_logloss, True, False)
metrics_combination_cv_classification(['binary_error', 'binary_logloss'], best_iter_error, True, False)
metrics_combination_cv_classification(['binary_logloss', 'binary_error'], best_iter_min, False, False)
metrics_combination_cv_classification(['binary_error', 'binary_logloss'], best_iter_min, False, False)
metrics_combination_cv_classification('binary_logloss', best_iter_logloss, True, True)
metrics_combination_cv_classification('binary_error', best_iter_error, True, True)
metrics_combination_cv_classification(['binary_logloss', 'binary_error'], best_iter_logloss, True, True)
metrics_combination_cv_classification(['binary_error', 'binary_logloss'], best_iter_error, True, True)
metrics_combination_cv_classification(['binary_logloss', 'binary_error'], best_iter_min, False, True)
metrics_combination_cv_classification(['binary_error', 'binary_logloss'], best_iter_min, False, True)