Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apriori.py line 224: ValueError: negative dimensions are not allowed #613

Open
fixablecar opened this issue Oct 30, 2019 · 6 comments
Open
Labels

Comments

@fixablecar
Copy link

fixablecar commented Oct 30, 2019

_bools = X[:, combin[:, 0]] == all_ones

Processing 24785850 combinations | Sampling itemset size 6
Traceback (most recent call last):
File "***.py", line 116, in
frequent_itemsets = apriori(df, min_support=0.8, use_colnames=True, verbose=1)

File "C:\ProgramData\Anaconda3\lib\site-packages\mlxtend\frequent_patterns\apriori.py", line 219, in apriori
_bools = X[:, combin[:, 0]] == all_ones

File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse_index.py", line 53, in getitem
return self._get_sliceXarray(row, col)

File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse\csc.py", line 222, in _get_sliceXarray
return self._major_index_fancy(col)._minor_slice(row)

File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse\compressed.py", line 693, in _major_index_fancy
res_indices = np.empty(nnz, dtype=idx_dtype)

ValueError: negative dimensions are not allowed

In my apriori.py, variable "combin" is a (4130975, 6) dataframe comprise of indices (dtype = int32).

In compressed.py, numpy cumsum takes the dtype from indices of "combin".

Negative values appeared after the numpy cumsum reached maximum of int32.

Not sure if it is an exception for numpy cumsum or mlxtend apriori.

@fixablecar
Copy link
Author

Below are the version of the package, Python 3.7.4

mlxtend.version
'0.17.0'

numpy.version
'1.16.5'

@rasbt
Copy link
Owner

rasbt commented Oct 30, 2019

Hm, not sure about what's going on. Could also be related to the compression and a scipy bug. But isn't cumsum not always returning a int64 array to make sure these issues don't happen? E.g.,

In [9]: a = np.array(range(1000000), dtype=np.int32)                            

In [10]: a.dtype                                                                
Out[10]: dtype('int32')

In [11]: np.cumsum(a).dtype                                                     
Out[11]: dtype('int64')

So, I am wondering how this happens with cumsum ...

COuld you run the example above in your NumPy version, and of cumsum returns int32, you can try to see if updating to NumPy v1.17.2 helps

@fixablecar
Copy link
Author

a.dtype
Out[5]: dtype('int32')

np.cumsum(a).dtype
Out[6]: dtype('int32')

Yes, it should be the problem. I will update numpy.

@rasbt rasbt reopened this Oct 31, 2019
@rasbt
Copy link
Owner

rasbt commented Oct 31, 2019

Thanks for confirming. In this case, we probably should add a warning to the apriori package. I am opening this issue again to address this at some point. I.e., we could simply add an

import warnings
from distutils.version import LooseVersion as Version


if Version(numpy_version) < Version("1.17"):
    warnings.warn('SOME TEXT to explain the issue')

@jakeplace
Copy link

Just wanted to chime in and say that I am also experiencing this issue (row 302 instead of 224), but I confirmed that I have the latest numpy:

 ~/miniconda3/lib/python3.7/site-packages/mlxtend/frequent_patterns/apriori.py in apriori(df, min_support, use_colnames, max_len, verbose, low_memory)
     300 
     301             if is_sparse:
 --> 302                 _bools = X[:, combin[:, 0]] == all_ones
     303                 for n in range(1, combin.shape[1]):
     304                     _bools = _bools & (X[:, combin[:, n]] == all_ones)
 
 ~/miniconda3/lib/python3.7/site-packages/scipy/sparse/_index.py in __getitem__(self, key)
      51                 return self._get_sliceXslice(row, col)
      52             elif col.ndim == 1:
 ---> 53                 return self._get_sliceXarray(row, col)
      54             raise IndexError('index results in >2 dimensions')
      55         elif row.ndim == 1:
 
 ~/miniconda3/lib/python3.7/site-packages/scipy/sparse/csc.py in _get_sliceXarray(self, row, col)
     220 
     221     def _get_sliceXarray(self, row, col):
 --> 222         return self._major_index_fancy(col)._minor_slice(row)
     223 
     224     def _get_arrayXint(self, row, col):
 
 ~/miniconda3/lib/python3.7/site-packages/scipy/sparse/compressed.py in _major_index_fancy(self, idx)
     691 
     692         nnz = res_indptr[-1]
 --> 693         res_indices = np.empty(nnz, dtype=idx_dtype)
     694         res_data = np.empty(nnz, dtype=self.dtype)
     695         csr_row_index(M, indices, self.indptr, self.indices, self.data,
 
 ValueError: negative dimensions are not allowed

I am using the sparse dtype instead of SparseDataFrame - when using SparseDataFrame with apriori it kills my Jupyter kernel.

Dataframe (bools):

DF density: 0.1837714070794341
DF shape: (60603, 1694)

Versions:

Pandas: 0.25.3
Numpy: 1.18.1
mlxtend: 0.17.1

@dbarbier
Copy link
Contributor

dbarbier commented Feb 5, 2020

I do not have a clear understanding of this issue, but it looks like some indices are too large, you may have to call apriori with low_memory=True in your case. Anyway this should be fixed by #646.

@rasbt rasbt removed the stat479 label Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants