Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving monotone constraints ("Fast" method; linked to #2305, #2717) #2770

Merged
Merged
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
1602179
Add util functions.
Feb 17, 2020
80c434e
Added monotone_constraints_method as a parameter.
Feb 18, 2020
4aa2b5e
Add the intermediate constraining method.
Feb 18, 2020
3bcb331
Updated tests.
Feb 18, 2020
e3ac038
Minor fixes.
Feb 18, 2020
0b5a8b1
Typo.
Feb 18, 2020
20294b5
Linting.
Feb 18, 2020
06e36d3
Ran the parameter generator for the doc.
Feb 19, 2020
81f8a91
Removed usage of the FeatureMonotone function.
Feb 20, 2020
590cc42
more fixes
guolinke Feb 28, 2020
6cb5b3c
Fix.
Mar 2, 2020
42c79f0
Remove duplicated code.
Mar 3, 2020
d4fdc43
Add debug checks.
Mar 3, 2020
d26af9b
Typo.
Mar 3, 2020
2a437ed
Bug fix.
Mar 3, 2020
b68e70d
Disable the use of intermediate monotone constraints and feature samp…
Mar 3, 2020
3757500
Added an alias for monotone constraining method.
Mar 3, 2020
ecc04a1
Use the right variable to get the number of threads.
Mar 3, 2020
dd012fc
Fix DEBUG checks.
Mar 3, 2020
57fe30a
Add back check to determine if histogram is splittable.
Mar 3, 2020
421a7a0
Added forgotten override keywords.
Mar 3, 2020
eb68720
Perform monotone constraint update only when necessary.
Mar 3, 2020
a530020
Small refactor of FastLeafConstraints.
Mar 3, 2020
4487183
Post rebase commit.
Mar 5, 2020
d1c73c3
Small refactor.
Mar 9, 2020
8122241
Typo.
Mar 9, 2020
cd40350
Added comment and slightly improved logic of monotone constraints.
Mar 10, 2020
5bf69f0
Forgot a const.
Mar 16, 2020
e726623
Vectors that are to be modified need to be pointers.
Mar 16, 2020
bfd1747
Rename FastLeafConstraints to IntermediateLeafConstraints to match do…
Mar 16, 2020
f45f0ec
Remove overload of GoUpToFindLeavesToUpdate.
Mar 16, 2020
745dc44
Stop memory leaking.
Mar 16, 2020
cf417d3
Fix cpplint issues.
Mar 17, 2020
fd34998
Fix checks.
Mar 17, 2020
2d935f3
Fix more cpplint issues.
Mar 17, 2020
410c1e7
Refactor config monotone constraints method.
Mar 19, 2020
e8fd270
Typos.
Mar 19, 2020
a4193a9
Remove useless empty lines.
Mar 19, 2020
2133fa6
Add new line to separate includes.
Mar 19, 2020
89dbcec
Replace unsigned ind by size_t.
Mar 19, 2020
406a793
Reduce number of trials in tests to decrease CI time.
Mar 19, 2020
a1cc513
Specify monotone constraints better in tests.
Mar 19, 2020
1eb287a
Removed outer loop in test of monotone constraints.
Mar 19, 2020
51380c2
Added categorical features to the monotone constraints tests.
Mar 19, 2020
729fb0b
Add blank line.
Mar 20, 2020
4024536
Regenerate parameters automatically.
Mar 20, 2020
2820a72
Speed up ShouldKeepGoingLeftRight.
Mar 23, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/Parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,16 @@ Learning Control Parameters

- you need to specify all features in order. For example, ``mc=-1,0,1`` means decreasing for 1st feature, non-constraint for 2nd feature and increasing for the 3rd feature

- ``monotone_constraints_method`` :raw-html:`<a id="monotone_constraints_method" title="Permalink to this parameter" href="#monotone_constraints_method">&#x1F517;&#xFE0E;</a>`, default = ``basic``, type = string, aliases: ``monotone_constraining_method``, ``mc_method``

- used only if ``monotone_constraints`` is set

- monotone constraints method

- ``basic``, the most basic monotone constraints method. It does not slow the library at all, but over-constrains the predictions

- ``intermediate``, a `more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library very slightly. However, this method is much less constraining than the basic method and should significantly improve the results

- ``feature_contri`` :raw-html:`<a id="feature_contri" title="Permalink to this parameter" href="#feature_contri">&#x1F517;&#xFE0E;</a>`, default = ``None``, type = multi-double, aliases: ``feature_contrib``, ``fc``, ``fp``, ``feature_penalty``

- used to control feature's split gain, will use ``gain[i] = max(0, feature_contri[i]) * gain[i]`` to replace the split gain of i-th feature
Expand Down
7 changes: 7 additions & 0 deletions include/LightGBM/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,13 @@ struct Config {
// desc = you need to specify all features in order. For example, ``mc=-1,0,1`` means decreasing for 1st feature, non-constraint for 2nd feature and increasing for the 3rd feature
std::vector<int8_t> monotone_constraints;

// alias = monotone_constraining_method, mc_method
// desc = used only if ``monotone_constraints`` is set
// desc = monotone constraints method
// descl2 = ``basic``, the most basic monotone constraints method. It does not slow the library at all, but over-constrains the predictions
// descl2 = ``intermediate``, a `more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library very slightly. However, this method is much less constraining than the basic method and should significantly improve the results
std::string monotone_constraints_method = "basic";

// type = multi-double
// alias = feature_contrib, fc, fp, feature_penalty
// default = None
Expand Down
24 changes: 22 additions & 2 deletions include/LightGBM/tree.h
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,28 @@ class Tree {

inline double split_gain(int split_idx) const { return split_gain_[split_idx]; }

inline double internal_value(int node_idx) const {
return internal_value_[node_idx];
}

inline bool IsNumericalSplit(int node_idx) const {
return !GetDecisionType(decision_type_[node_idx], kCategoricalMask);
}

inline int left_child(int node_idx) const { return left_child_[node_idx]; }

inline int right_child(int node_idx) const { return right_child_[node_idx]; }

inline int split_feature_inner(int node_idx) const {
return split_feature_inner_[node_idx];
}

inline int leaf_parent(int leaf_idx) const { return leaf_parent_[leaf_idx]; }

inline uint32_t threshold_in_bin(int node_idx) const {
return threshold_in_bin_[node_idx];
}

/*! \brief Get the number of data points that fall at or below this node*/
inline int data_count(int node) const { return node >= 0 ? internal_count_[node] : leaf_count_[~node]; }

Expand Down Expand Up @@ -444,7 +466,6 @@ inline void Tree::Split(int leaf, int feature, int real_feature,
// add new node
split_feature_inner_[new_node_idx] = feature;
split_feature_[new_node_idx] = real_feature;

split_gain_[new_node_idx] = gain;
// add two new leaves
left_child_[new_node_idx] = ~leaf;
Expand Down Expand Up @@ -552,7 +573,6 @@ inline int Tree::GetLeafByMap(const std::unordered_map<int, double>& feature_val
return ~node;
}


} // namespace LightGBM

#endif // LightGBM_TREE_H_
11 changes: 11 additions & 0 deletions src/io/config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,17 @@ void Config::CheckParamConflict() {
force_col_wise = true;
force_row_wise = false;
}
if (is_parallel && monotone_constraints_method == std::string("intermediate")) {
// In distributed mode, local node doesn't have histograms on all features, cannot perform "intermediate" monotone constraints.
Log::Warning("Cannot use \"intermediate\" monotone constraints in parallel learning, auto set to \"basic\" method.");
monotone_constraints_method = "basic";
}
if (feature_fraction_bynode != 1.0 && monotone_constraints_method == std::string("intermediate")) {
// "intermediate" monotone constraints need to recompute splits. If the features are sampled when computing the
// split initially, then the sampling needs to be recorded or done once again, which is currently not supported
Log::Warning("Cannot use \"intermediate\" monotone constraints with feature fraction different from 1, auto set monotone constraints to \"basic\" method.");
monotone_constraints_method = "basic";
}
}

std::string Config::ToString() const {
Expand Down
6 changes: 6 additions & 0 deletions src/io/config_auto.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ const std::unordered_map<std::string, std::string>& Config::alias_table() {
{"topk", "top_k"},
{"mc", "monotone_constraints"},
{"monotone_constraint", "monotone_constraints"},
{"monotone_constraining_method", "monotone_constraints_method"},
{"mc_method", "monotone_constraints_method"},
{"feature_contrib", "feature_contri"},
{"fc", "feature_contri"},
{"fp", "feature_contri"},
Expand Down Expand Up @@ -215,6 +217,7 @@ const std::unordered_set<std::string>& Config::parameter_set() {
"max_cat_to_onehot",
"top_k",
"monotone_constraints",
"monotone_constraints_method",
"feature_contri",
"forcedsplits_filename",
"refit_decay_rate",
Expand Down Expand Up @@ -414,6 +417,8 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str
monotone_constraints = Common::StringToArray<int8_t>(tmp_str, ',');
}

GetString(params, "monotone_constraints_method", &monotone_constraints_method);

if (GetString(params, "feature_contri", &tmp_str)) {
feature_contri = Common::StringToArray<double>(tmp_str, ',');
}
Expand Down Expand Up @@ -633,6 +638,7 @@ std::string Config::SaveMembersToString() const {
str_buf << "[max_cat_to_onehot: " << max_cat_to_onehot << "]\n";
str_buf << "[top_k: " << top_k << "]\n";
str_buf << "[monotone_constraints: " << Common::Join(Common::ArrayCast<int8_t, int>(monotone_constraints), ",") << "]\n";
str_buf << "[monotone_constraints_method: " << monotone_constraints_method << "]\n";
str_buf << "[feature_contri: " << Common::Join(feature_contri, ",") << "]\n";
str_buf << "[forcedsplits_filename: " << forcedsplits_filename << "]\n";
str_buf << "[refit_decay_rate: " << refit_decay_rate << "]\n";
Expand Down
3 changes: 2 additions & 1 deletion src/io/tree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ Tree::~Tree() {

int Tree::Split(int leaf, int feature, int real_feature, uint32_t threshold_bin,
double threshold_double, double left_value, double right_value,
int left_cnt, int right_cnt, double left_weight, double right_weight, float gain, MissingType missing_type, bool default_left) {
int left_cnt, int right_cnt, double left_weight, double right_weight, float gain,
MissingType missing_type, bool default_left) {
Split(leaf, feature, real_feature, left_value, right_value, left_cnt, right_cnt, left_weight, right_weight, gain);
int new_node_idx = num_leaves_ - 1;
decision_type_[new_node_idx] = 0;
Expand Down
13 changes: 12 additions & 1 deletion src/treelearner/leaf_splits.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ class LeafSplits {
}

/*!

* \brief Init split on current leaf on partial data.
* \param leaf Index of current leaf
* \param data_partition current data partition
Expand All @@ -45,6 +44,18 @@ class LeafSplits {
sum_hessians_ = sum_hessians;
}

/*!
* \brief Init split on current leaf on partial data.
* \param leaf Index of current leaf
* \param sum_gradients
* \param sum_hessians
*/
void Init(int leaf, double sum_gradients, double sum_hessians) {
leaf_index_ = leaf;
sum_gradients_ = sum_gradients;
sum_hessians_ = sum_hessians;
}

/*!
* \brief Init splits on current leaf, it will traverse all data to sum up the results
* \param gradients
Expand Down
Loading