Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Implement proper Lift curve; keep Cumulative gains as an option #5075

Merged
merged 4 commits into from
Dec 8, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
298 changes: 159 additions & 139 deletions Orange/widgets/evaluate/owliftcurve.py

Large diffs are not rendered by default.

68 changes: 67 additions & 1 deletion Orange/widgets/evaluate/tests/test_owliftcurve.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import copy
import unittest
from unittest.mock import Mock

import numpy as np

@@ -9,7 +11,8 @@
from Orange.widgets.evaluate.tests.base import EvaluateTest
from Orange.widgets.tests.base import WidgetTest
from Orange.widgets.tests.utils import simulate
from Orange.widgets.evaluate.owliftcurve import OWLiftCurve
from Orange.widgets.evaluate.owliftcurve import OWLiftCurve, cumulative_gains, \
cumulative_gains_from_results
from Orange.tests import test_filename


@@ -53,3 +56,66 @@ def test_nan_input(self):
self.assertTrue(self.widget.Error.invalid_results.is_shown())
self.send_signal(self.widget.Inputs.evaluation_results, None)
self.assertFalse(self.widget.Error.invalid_results.is_shown())


class UtilsTest(unittest.TestCase):
@staticmethod
def test_cumulative_gains():
shuffle = [1, 2, 0, 3, 5, 4]
y_true = np.array([1, 1, 0, 0, 1, 0])[shuffle]
y_scores = np.array([0.9, 0.6, 0.5, 0.4, 0.4, 0.2])[shuffle]

assert_almost_equal = np.testing.assert_almost_equal

contacted, respondents, thresholds = cumulative_gains(y_true, y_scores)
assert_almost_equal(contacted, np.array([1, 2, 3, 5, 6]) / 6)
assert_almost_equal(thresholds, [0.9, 0.6, 0.5, 0.4, 0.2])
assert_almost_equal(respondents, np.array([1, 2, 2, 3, 3]) / 3)

contacted, respondents, thresholds = cumulative_gains(y_true, 1 - y_scores, target=0)
assert_almost_equal(contacted, np.array([1, 3, 4, 5, 6]) / 6)
assert_almost_equal(thresholds, [0.8, 0.6, 0.5, 0.4, 0.1])
assert_almost_equal(respondents, np.array([1, 2, 3, 3, 3]) / 3)

contacted, respondents, thresholds = \
cumulative_gains(np.array([], dtype=int), np.array([]))
assert_almost_equal(contacted, [])
assert_almost_equal(respondents, [])
assert_almost_equal(thresholds, [])

@staticmethod
def test_cumulative_gains_from_results():
shuffle = [1, 2, 0, 3, 5, 4]
y_true = np.array([1, 1, 0, 0, 1, 0])[shuffle]
y_scores = np.array([0.9, 0.6, 0.5, 0.4, 0.4, 0.2])[shuffle]

results = Mock()
results.actual = y_true
results.probabilities = \
[Mock(), Mock(), np.vstack((1 - y_scores, y_scores)).T]

assert_almost_equal = np.testing.assert_almost_equal

contacted, respondents, thresholds = \
cumulative_gains_from_results(results, 1, 2)
assert_almost_equal(thresholds, [0.9, 0.6, 0.5, 0.4, 0.2])
assert_almost_equal(contacted, np.array([1, 2, 3, 5, 6]) / 6)
assert_almost_equal(respondents, np.array([1, 2, 2, 3, 3]) / 3)

contacted, respondents, thresholds = \
cumulative_gains_from_results(results, 0, 2)
assert_almost_equal(contacted, np.array([1, 3, 4, 5, 6]) / 6)
assert_almost_equal(thresholds, [0.8, 0.6, 0.5, 0.4, 0.1])
assert_almost_equal(respondents, np.array([1, 2, 3, 3, 3]) / 3)

results.actual = np.array([], dtype=int)
results.probabilities = np.empty((3, 0, 2))
contacted, respondents, thresholds = \
cumulative_gains(np.array([], dtype=int), np.array([]))
assert_almost_equal(contacted, [])
assert_almost_equal(respondents, [])
assert_almost_equal(thresholds, [])


if __name__ == "__main__":
unittest.main()
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 16 additions & 19 deletions doc/visual-programming/source/widgets/evaluate/liftcurve.md
Original file line number Diff line number Diff line change
@@ -7,31 +7,28 @@ Measures the performance of a chosen classifier against a random classifier.

- Evaluation Results: results of testing classification algorithms

The **Lift curve** shows the relation between the number of instances which were predicted positive and those that are indeed positive and thus measures the performance of a chosen classifier against a random classifier. The graph is constructed with the cumulative number of cases (in descending order of probability) on the x-axis and the cumulative number of true positives on the y-axis. Lift curve is often used in segmenting the population, e.g., plotting the number of responding customers against the number of all customers contacted. You can also determine the optimal classifier and its threshold from the graph.
The **Lift curve** shows the curves for analysing the proportion of true positive data instances in relation to the classifier's threshold or the number of instances that we classify as positive.

![](images/LiftCurve-stamped.png)
Cumulative gains chart shows the proportion of true positive instances (for example, the number of clients who accept the offer) as a function of the number of positive instances (the number of clients contacted), assuming the the instances are ordered according to the model's probability of being positive (e.g. ranking of clients).

1. Choose the desired *Target class*. The default class is chosen alphabetically.
2. If test results contain more than one classifier, the user can choose which curves she or he wants to see plotted. Click on a classifier to select or deselect the curve.
3. *Show lift convex hull* plots a convex hull over lift curves for all classifiers (yellow curve). The curve shows the optimal classifier (or combination thereof) for each desired TP/P rate.
4. Press *Save Image* if you want to save the created image to your computer in a .svg or .png format.
5. Produce a report.
6. 2-D pane with **P rate** (population) as x-axis and **TP rate** (true positives) as a y-axis. The diagonal line represents the behavior of a random classifier. Click and drag to move the pane and scroll in or out to zoom. Click on the "*A*" sign at the bottom left corner to realign the pane.
![](images/LiftCurve-cumulative-gain.png)

**Note!** The perfect classifier would have a steep slope towards 1 until all
classes are guessed correctly and then run straight along 1 on y-axis to
(1,1).
Lift curve shows the ratio between the proportion of true positive instances in the selection and the proportion of customers contacted. See [a tutorial for more details](https://medium.com/analytics-vidhya/understanding-lift-curve-b674d21e426).

Example
-------
![](images/LiftCurve-stamped.png)

At the moment, the only widget which gives the right type of the signal needed by the **Lift Curve** is [Test & Score](../evaluate/testandscore.md).
1. Choose the desired *Target class*. The default is chosen alphabetically.
2. Choose whether to observe lift curve or cumulative gains.
3. If test results contain more than one classifier, the user can choose which curves she or he wants to see plotted. Click on a classifier to select or deselect the curve.
4. *Show lift convex hull* plots a convex hull over lift curves for all classifiers (yellow curve). The curve shows the optimal classifier (or combination thereof) for each desired lift or cumulative gain.
5. Press *Save Image* to save the created image in a .svg or .png format.
6. Produce a report.
7. A plot with **lift** or **cumulative gain** vs. **positive rate**. The dashed line represents the behavior of a random classifier.

In the example below, we try to see the prediction quality for the class 'survived' on the *Titanic* dataset. We compared three different classifiers in the Test Learners widget and sent them to Lift Curve to see their performance against a random model. We see the [Tree](../model/tree.md) classifier is the best out of the three, since it best aligns with *lift convex hull*. We also see that its performance is the best for the first 30% of the population (in order of descending probability), which we can set as the threshold for optimal classification.

![](images/LiftCurve-example.png)
Example
-------

References
----------
The widgets that provide the right type of the signal needed by the **Lift Curve** (evaluation data) are [Test & Score](../evaluate/testandscore.md) and [Predictions](../evaluate/predictions.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Predictions part is true only for labelled data. Perhaps make this clear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but Test & Score warns you about missing target variable, while Predictions doesn't. I don't know, I just think it would be clearer that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you meant. But it's somehow clear that the data has to have a target variable, so I'd rather keep it short.


Handouts of the University of Notre Dame on Data Mining - Lift Curve. Available [here](https://www3.nd.edu/~busiforc/handouts/DataMining/Lift%20Charts.html).
In the example below, we observe the lift curve and cumulative gain for the bank marketing data, where the classification goal is to predict whether the client will accept a term deposit offer based on his age, job, education, marital status and similar data. The data set is available in the Datasets widget. We run the learning algorithms in the Test and Score widget and send the results to Lift Curve to see their performance against a random model. Of the two algorithms tested, logistic regression outperforms the naive Bayesian classifier. The curve tells us that by picking the first 20 % of clients as ranked by the model, we are going to hit four times more positive instances than by selecting a random sample with 20 % of clients.
5 changes: 4 additions & 1 deletion doc/widgets.json
Original file line number Diff line number Diff line change
@@ -638,7 +638,10 @@
"doc": "visual-programming/source/widgets/evaluate/liftcurve.md",
"icon": "../Orange/widgets/evaluate/icons/LiftCurve.svg",
"background": "#C3F3F3",
"keywords": []
"keywords": [
"lift",
"cumulative gain"
]
},
{
"text": "Calibration Plot",