Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGNITE-20139 Random forest stopping criteria check fixed. Gain calculation implemented. #256

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ibelyakov
Copy link

The issue happens when one “pure“ node (with impurity* = 0) is presented in the tree. We calculate an impurity only for children nodes and not for the current node, as well as do not check whether the node is “pure“ and contains just one label, due to that, the “bestSplit” calculation is executed for the already “pure“ node, which decides that all items should be moved to the left child node and no items to the right (leaf node), which gives 2 “pure“ children nodes. Since we don’t calculate impurity for the current (parent) node the parentNode.getImpurity() - split.get().getImpurity() > minImpurityDelta check is always true, and we continue to split the already “pure“ node until the max tree depth is reached.
The following changes were made to resolve the issue:

  1. Gain** calculation and check for the split were added.
  2. Node’s impurity check is added, once the impurity becomes 0 it means that the node is “pure” and we don’t need to calculate a split for it.
  3. Gini impurity calculation was changed to (1 - sum(p^2)) to get the correct values in the range from 0 to 0.5 as required for the Gini index.

* Impurity - is a value from 0 to 0.5, which shows whether the node is “pure“ (impurity = 0) having just 1 label or “impure” with impurity=0.5, which is the worst scenario where the label ratio is 1:1.
** Gain - is a difference between the parent node’s impurity and weighted children nodes' impurity. The split which provides the maximum gain value is considered the best. See https://www.learndatasci.com/glossary/gini-impurity/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant