Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix in uplift_by_percentile #201

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open

Conversation

nklemashev
Copy link


name: "Pull request"
about: Make changes in scikit-uplift

📑 Description of the Change

In a recently updated Python environment the function uplift_by_percentile in sklift.metrics.metrics.py throws warning VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. return asarray(a).ndim.

If I add

import warnings
warnings.filterwarnings('error')

the warning raises error at the following lines:

--> 716     df.loc[-1, :] = ['total', n_trmnt_total, n_ctrl_total, response_rate_trmnt_total,
    717                      response_rate_ctrl_total, response_rate_trmnt_total - response_rate_ctrl_total]

Versions of some of Python packages:

pandas: 1.5.1
numpy: 1.23.4
ipykernel: 6.17.1
sklift: 0.5.1
sklearn: 1.1.3

The line which raises error simply adds a row in the end of df. The rows values are provided as a list. The first element of the list is of type string. All the other elements are numpy.arrays of size 1. Without much of finding a justification for why this creates a warning, I just added [0] to the end of all *_total variables in these two lines and the warning does not appear any more.

Verification Process

A simple code tests if this warning happens or not:

import numpy as np
from sklift.metrics import uplift_by_percentile

import warnings
warnings.filterwarnings('error')

rng = np.random.default_rng(12345)
n = 1000
treatment = rng.integers(low = 0, high = 2, size = n)
y_true = rng.integers(low = 0, high = 2, size = n)
uplift = 2 * rng.random(size = n) - 1

uplift_by_percentile(
    y_true = y_true,
    uplift = uplift,
    treatment = treatment,
    strategy = 'overall',
    bins = 10,
    std = False,
    total = True,
    string_percentiles = True
)

If I run this without changes in sklift, I get the error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File C:\Users\NikolajKlemashev\Anaconda3\envs\sklift_bug\lib\site-packages\numpy\core\fromnumeric.py:3154, in ndim(a)
   3153 try:
-> 3154     return a.ndim
   3155 except AttributeError:

AttributeError: 'list' object has no attribute 'ndim'

During handling of the above exception, another exception occurred:

VisibleDeprecationWarning                 Traceback (most recent call last)
Input In [2], in <cell line: 13>()
     10 y_true = rng.integers(low = 0, high = 2, size = n)
     11 uplift = 2 * rng.random(size = n) - 1
---> 13 uplift_by_percentile(
     14     y_true = y_true,
     15     uplift = uplift,
     16     treatment = treatment,
     17     strategy = 'overall',
     18     bins = 10,
     19     std = False,
     20     total = True,
     21     string_percentiles = True
     22 )

File C:\Users\NikolajKlemashev\Anaconda3\envs\sklift_bug\lib\site-packages\sklift\metrics\metrics.py:716, in uplift_by_percentile(y_true, uplift, treatment, strategy, bins, std, total, string_percentiles)
    710     response_rate_trmnt_total, variance_trmnt_total, n_trmnt_total = response_rate_by_percentile(
    711         y_true, uplift, treatment, strategy=strategy, group='treatment', bins=1)
    713     response_rate_ctrl_total, variance_ctrl_total, n_ctrl_total = response_rate_by_percentile(
    714         y_true, uplift, treatment, strategy=strategy, group='control', bins=1)
--> 716     df.loc[-1, :] = ['total', n_trmnt_total, n_ctrl_total, response_rate_trmnt_total,
    717                      response_rate_ctrl_total, response_rate_trmnt_total - response_rate_ctrl_total]
    719 if std:
    720     std_treatment = np.sqrt(variance_trmnt)

File C:\Users\NikolajKlemashev\Anaconda3\envs\sklift_bug\lib\site-packages\pandas\core\indexing.py:818, in _LocationIndexer.__setitem__(self, key, value)
    815 self._has_valid_setitem_indexer(key)
    817 iloc = self if self.name == "iloc" else self.obj.iloc
--> 818 iloc._setitem_with_indexer(indexer, value, self.name)

File C:\Users\NikolajKlemashev\Anaconda3\envs\sklift_bug\lib\site-packages\pandas\core\indexing.py:1795, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name)
   1792 # align and set the values
   1793 if take_split_path:
   1794     # We have to operate column-wise
-> 1795     self._setitem_with_indexer_split_path(indexer, value, name)
   1796 else:
   1797     self._setitem_single_block(indexer, value, name)

File C:\Users\NikolajKlemashev\Anaconda3\envs\sklift_bug\lib\site-packages\pandas\core\indexing.py:1833, in _iLocIndexer._setitem_with_indexer_split_path(self, indexer, value, name)
   1830 if isinstance(value, ABCDataFrame):
   1831     self._setitem_with_indexer_frame_value(indexer, value, name)
-> 1833 elif np.ndim(value) == 2:
   1834     self._setitem_with_indexer_2d_value(indexer, value)
   1836 elif len(ilocs) == 1 and lplane_indexer == len(value) and not is_scalar(pi):
   1837     # We are setting multiple rows in a single column.

File <__array_function__ internals>:180, in ndim(*args, **kwargs)

File C:\Users\NikolajKlemashev\Anaconda3\envs\sklift_bug\lib\site-packages\numpy\core\fromnumeric.py:3156, in ndim(a)
   3154     return a.ndim
   3155 except AttributeError:
-> 3156     return asarray(a).ndim

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.

After the correction the same code runs without any errors.

I

Release Notes

Fix deprecation warning in metrics.uplift_by_percentile.

Additional info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant