Skip to content

Commit 4da81c6

Browse files
authored
Merge pull request #64 from PythonPredictions/develop
develop to master - new version
2 parents 6ef6723 + 48d8a9a commit 4da81c6

22 files changed

+645
-295
lines changed

.github/ISSUE_TEMPLATE/bug_report.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
name: Bug report
3+
about: Create a report to help us improve
4+
---
5+
6+
<!-- Please search existing issues to avoid creating duplicates. -->
7+
8+
# Bug Report
9+
10+
Bug: X does not work when I do Y
11+
12+
## Description
13+
14+
Info about the bug goes here.
15+
16+
### Steps to Reproduce
17+
18+
1. Step 1
19+
2. Step 2
20+
3. ...
21+
22+
### Expected Result
23+
24+
I was expecting ...
25+
26+
You may write the expected result or add a screenshot.
27+
28+
### Actual Results
29+
30+
I actually got ...
31+
32+
Would be awesome to link screenshots here and/or error messages received.

.github/ISSUE_TEMPLATE/issue.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
name: Task
3+
about: A small issue t. It will usually be labeled as `good first issue` or `enhancement`.
4+
---
5+
6+
<!-- Issue title should mirror the Task Title. -->
7+
8+
# Task Title
9+
10+
Task: I am an Issue
11+
12+
## Task Description
13+
14+
This issue will...

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Story Title
2+
3+
[This is the Issue Title](https://github.com/username/repository-name/issues/1)
4+
5+
## Changes made
6+
7+
- made this
8+
- did that
9+
10+
## How does the solution address the problem
11+
12+
This PR will...
13+
14+
## Linked issues
15+
16+
Resolves #1

.github/workflows/development_CI.yaml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Runs CI when pushing to develop branch
2+
# runs pylint and pytest
3+
4+
name: CI_develop_action
5+
6+
on:
7+
push:
8+
branches: [ develop ]
9+
pull_request:
10+
branches: [ develop ]
11+
12+
jobs:
13+
build:
14+
15+
runs-on: ubuntu-latest
16+
17+
steps:
18+
- uses: actions/checkout@v2
19+
20+
- name: Set up Python 3.8
21+
uses: actions/setup-python@v2
22+
with:
23+
python-version: 3.8
24+
25+
- name: Install dependencies
26+
run: |
27+
python -m pip install --upgrade pip
28+
python -m pip install -r requirements.txt
29+
python -m pip install pylint pytest pytest-mock pytest-cov
30+
31+
- name: Test with pytest
32+
run: |
33+
pytest --cov=cobra tests/
34+
35+
# until we refactor accordingly
36+
#- name: Lint check with pylint
37+
# run: |
38+
# pylint cobra

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#Ignoired directories in root folder
1+
#Ignored directories in root folder
22

33

44
# Byte-compiled / optimized / DLL files
@@ -109,3 +109,4 @@ ENV/
109109
# Other ignore files
110110
*.pptx
111111
*.ppt
112+
.idea/

README.rst

Lines changed: 89 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -1,87 +1,89 @@
1-
=====
2-
cobra
3-
=====
4-
5-
**cobra** is a Python package to build predictive models using logistic regression with a focus on performance and interpretation. It consists of several modules for data preprocessing, feature selection and model evaluation. The underlying methodology was developed at Python Predictions in the course of hundreds of business-related prediction challenges. It has been tweaked, tested and optimized over the years based on feedback from clients, our team, and academic researchers.
6-
7-
8-
Main Features
9-
=============
10-
11-
- Prepare a given pandas DataFrame for predictive modelling:
12-
13-
- partition into train/selection/validation sets
14-
- create bins from continuous variables
15-
- regroup categorical variables based on statistical significance
16-
- replace missing values and
17-
- add columns with incidence rate per category/bin
18-
19-
- Perform univariate feature selection based on AUC
20-
- Compute correlation matrix of predictors
21-
- Find the suitable variables using forward feature selection
22-
- Evaluate model performance and visualize the results
23-
24-
Getting started
25-
===============
26-
27-
These instructions will get you a copy of the project up and running on your local machine for usage, development and testing purposes.
28-
29-
Requirements
30-
------------
31-
32-
This package requires the usual Python packages for data science:
33-
34-
- numpy (>=1.19.4)
35-
- pandas (>=1.1.5)
36-
- scipy (>=1.5.4)
37-
- scikit-learn (>=0.23.1)
38-
- matplotlib (>=3.3.3)
39-
- seaborn (>=0.11.0)
40-
41-
42-
These packages, along with their versions are listed in ``requirements.txt`` and can be installed using ``pip``: ::
43-
44-
45-
pip install -r requirements.txt
46-
47-
48-
**Note**: if you want to install cobra with e.g. pip, you don't have to install all of these requirements as these are automatically installed with cobra itself.
49-
50-
Installation
51-
------------
52-
53-
The easiest way to install cobra is using ``pip`` ::
54-
55-
pip install -U pythonpredictions-cobra
56-
57-
Contributing to cobra
58-
=====================
59-
60-
We'd love you to contribute to the development of cobra! There are many ways in which you can contribute, the most common of which is to contribute to the source code or documentation of the project. However, there are many other ways you can contribute (report issues, improve code coverage by adding unit tests, ...).
61-
We use GitHub issue to track all bugs and feature requests. Feel free to open an issue in case you found a bug or in case you wish to see a new feature added.
62-
63-
How to contribute code
64-
----------------------
65-
66-
The preferred way to contribute to cobra is to fork the main repository on GitHub, then submit a "pull request" (PR). The first step is to get a local development copy by installing cobra from source through the following steps:
67-
68-
- Fork the `project repository <https://github.com/PythonPredictions/cobra>`_. For more details on how to fork a repository see `this guide <https://docs.github.com/en/free-pro-team@latest/github/getting-started-with-github/fork-a-repo>`__
69-
- Clone your fork of cobra's repo.
70-
- Open a shell and navigate to the folder where this repo was cloned in.
71-
- Once you are in the folder, execute ``pip install --editable .``.
72-
- Create a *feature branch* to do your development.
73-
- Once your are finished developing, you can create a *pull request* from your fork (see `this guide <https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork>`__ for detailed instructions).
74-
75-
**Notes**
76-
77-
- Make sure to follow the *PEP 8* styleguide if you make any changes to cobra. You should also write or modify unit test for your changes.
78-
- To avoid duplicating work, it is highly recommended that you search through the issue tracker and/or the PR list. If in doubt, you can always reach out to us through email ([email protected])
79-
80-
Help and Support
81-
================
82-
83-
Documentation
84-
-------------
85-
86-
- HTML documentation of the `individual modules <https://pythonpredictions.github.io/cobra.io/docstring/modules.html>`_
87-
- A step-by-step `tutorial <https://pythonpredictions.github.io/cobra.io/tutorial.html>`_
1+
2+
3+
.. image:: https://img.shields.io/pypi/v/pythonpredictions-cobra.svg
4+
:target: https://pypi.org/project/pythonpredictions-cobra/
5+
.. image:: https://img.shields.io/pypi/dm/pythonpredictions-cobra.svg
6+
:target: https://pypistats.org/packages/pythonpredictions-cobra
7+
.. image:: https://github.com/PythonPredictions/cobra/actions/workflows/development_CI.yaml/badge.svg?branch=develop
8+
:target: https://github.com/PythonPredictions/cobra/actions/workflows/development_CI.yaml
9+
10+
------------------------------------------------------------------------------------------------------------------------------------
11+
12+
=====
13+
cobra
14+
=====
15+
.. image:: material\logo.png
16+
:width: 300
17+
18+
**cobra** is a Python package to build predictive models using linear/logistic regression with a focus on performance and interpretation. It consists of several modules for data preprocessing, feature selection and model evaluation. The underlying methodology was developed at Python Predictions in the course of hundreds of business-related prediction challenges. It has been tweaked, tested and optimized over the years based on feedback from clients, our team, and academic researchers.
19+
20+
Main Features
21+
=============
22+
23+
- Prepare a given pandas DataFrame for predictive modelling:
24+
25+
- partition into train/selection/validation sets
26+
- create bins from continuous variables
27+
- regroup categorical variables based on statistical significance
28+
- replace missing values and
29+
- add columns with incidence rate per category/bin
30+
31+
- Perform univariate feature selection based on AUC
32+
- Compute correlation matrix of predictors
33+
- Find the suitable variables using forward feature selection
34+
- Evaluate model performance and visualize the results
35+
36+
Getting started
37+
===============
38+
39+
These instructions will get you a copy of the project up and running on your local machine for usage, development and testing purposes.
40+
41+
Requirements
42+
------------
43+
44+
This package requires the usual Python packages for data science:
45+
46+
- numpy (>=1.19.4)
47+
- pandas (>=1.1.5)
48+
- scipy (>=1.5.4)
49+
- scikit-learn (>=0.23.1)
50+
- matplotlib (>=3.3.3)
51+
- seaborn (>=0.11.0)
52+
53+
54+
These packages, along with their versions are listed in ``requirements.txt`` and can be installed using ``pip``: ::
55+
56+
57+
pip install -r requirements.txt
58+
59+
60+
**Note**: if you want to install cobra with e.g. pip, you don't have to install all of these requirements as these are automatically installed with cobra itself.
61+
62+
Installation
63+
------------
64+
65+
The easiest way to install cobra is using ``pip``: ::
66+
67+
pip install -U pythonpredictions-cobra
68+
69+
Contributing to cobra
70+
=====================
71+
72+
We'd love you to contribute to the development of cobra! There are many ways in which you can contribute, the most common of which is to contribute to the source code or documentation of the project. However, there are many other ways you can contribute (report issues, improve code coverage by adding unit tests, ...).
73+
We use GitHub issue to track all bugs and feature requests. Feel free to open an issue in case you found a bug or in case you wish to see a new feature added.
74+
75+
For more details, check our `wiki <https://github.com/PythonPredictions/cobra/wiki/Contributing-guidelines-&-workflows>`_.
76+
77+
Help and Support
78+
================
79+
80+
Documentation
81+
-------------
82+
83+
- HTML documentation of the `individual modules <https://pythonpredictions.github.io/cobra.io/docstring/modules.html>`_
84+
- A step-by-step `tutorial <https://pythonpredictions.github.io/cobra.io/tutorial.html>`_
85+
86+
Outreach
87+
-------------
88+
89+
- Check out the Data Science Leuven Meetup `talk <https://www.youtube.com/watch?v=w7ceZZqMEaA&feature=youtu.be>`_ by one of the core developers (second presentation)

cobra/evaluation/evaluator.py

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -35,15 +35,20 @@ class Evaluator():
3535
probability_cutoff : float
3636
probability cut off to convert probability scores to a binary score
3737
roc_curve : dict
38-
map containing true-positive-rate, false-positve-rate at various
38+
map containing true-positive-rate, false-positive-rate at various
3939
thresholds (also incl.)
40+
n_bins : int, optional
41+
defines the number of bins used to calculate the lift curve for
42+
(by default 10, so deciles)
4043
"""
4144

4245
def __init__(self, probability_cutoff: float=None,
43-
lift_at: float=0.05):
46+
lift_at: float=0.05,
47+
n_bins: int = 10):
4448

4549
self.lift_at = lift_at
4650
self.probability_cutoff = probability_cutoff
51+
self.n_bins = n_bins
4752

4853
# Placeholder to store fitted output
4954
self.scalar_metrics = None
@@ -85,7 +90,7 @@ def fit(self, y_true: np.ndarray, y_pred: np.ndarray):
8590

8691
self.roc_curve = {"fpr": fpr, "tpr": tpr, "thresholds": thresholds}
8792
self.confusion_matrix = confusion_matrix(y_true, y_pred_b)
88-
self.lift_curve = Evaluator._compute_lift_per_decile(y_true, y_pred)
93+
self.lift_curve = Evaluator._compute_lift_per_bin(y_true, y_pred, self.n_bins)
8994
self.cumulative_gains = Evaluator._compute_cumulative_gains(y_true,
9095
y_pred)
9196

@@ -199,8 +204,7 @@ def plot_confusion_matrix(self, path: str=None, dim: tuple=(12, 8),
199204

200205
plt.show()
201206

202-
def plot_cumulative_response_curve(self, path: str=None,
203-
dim: tuple=(12, 8)):
207+
def plot_cumulative_response_curve(self, path: str=None, dim: tuple=(12, 8)):
204208
"""Plot cumulative response curve
205209
206210
Parameters
@@ -430,17 +434,21 @@ def _compute_cumulative_gains(y_true: np.ndarray,
430434
return percentages, gains
431435

432436
@staticmethod
433-
def _compute_lift_per_decile(y_true: np.ndarray,
434-
y_pred: np.ndarray) -> tuple:
435-
"""Compute lift of the model per decile, returns x-labels, lifts and
436-
the target incidence to create cummulative response curves
437+
def _compute_lift_per_bin(y_true: np.ndarray,
438+
y_pred: np.ndarray,
439+
n_bins: int = 10) -> tuple:
440+
"""Compute lift of the model for a given number of bins, returns x-labels,
441+
lifts and the target incidence to create cumulative response curves
437442
438443
Parameters
439444
----------
440445
y_true : np.ndarray
441446
True binary target data labels
442447
y_pred : np.ndarray
443448
Target scores of the model
449+
n_bins : int, optional
450+
defines the number of bins used to calculate the lift curve for
451+
(by default 10, so deciles)
444452
445453
Returns
446454
-------
@@ -451,7 +459,7 @@ def _compute_lift_per_decile(y_true: np.ndarray,
451459
lifts = [Evaluator._compute_lift(y_true=y_true,
452460
y_pred=y_pred,
453461
lift_at=perc_lift)
454-
for perc_lift in np.arange(0.1, 1.1, 0.1)]
462+
for perc_lift in np.linspace(1/n_bins, 1, num=n_bins, endpoint=True)]
455463

456464
x_labels = [len(lifts)-x for x in np.arange(0, len(lifts), 1)]
457465

0 commit comments

Comments
 (0)