-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Fix classification trees for data with repeated feature values #6488
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6488 +/- ##
=======================================
Coverage 87.66% 87.66%
=======================================
Files 321 321
Lines 69374 69374
=======================================
Hits 60817 60817
Misses 8557 8557 |
Hmm, the trees are built differently. When the test fails, I see the following splitting process.
And then my debugging code crashed because petal_width split does not have a
Furthermore, I see different scores for some other attributes (sepal length for the first and the second split). |
ce900b9
to
ac93cb2
Compare
ac93cb2
to
d1c9e14
Compare
During debugging, I saw that It avoided too many: it skipped if the next class value was the same or the next value was the same. That was a problem when feature and class values could both (interchangeably) repeat. |
d1c9e14
to
9ef29b7
Compare
9ef29b7
to
01a5e07
Compare
01a5e07
to
cbe2e11
Compare
Issue
I started seeing this on github tests (only on Ubuntu).
Description of changes
The bug ran deeper. See my last comment:
find_threshold_entropy
skipped computing too many entropies.Includes