"negative dimensions are not allowed" when integrating large dataset #147

aterceros · 2023-06-01T18:48:54Z

Hello,

I've been trying to integrate 8 datasets (quiet large) using the raw values, and also the normalized values, but I keep getting the following error: ValueError: negative dimensions are not allowed. When I subset into 1000 cells per sample scanorma runs without issues, and similarly if only ran with the highly variable genes. Any hints on why this issue happens would be greatly appreciated.
Thank you.

LEMTideman · 2024-03-20T17:43:00Z

I had the same problem... I get the following error in the Scanorama transform function: "ValueError: negative dimensions not allowed." The problem is related to the scipy compressed sparse row matrix format. It originates from the _self.major_index_fancy(row) line here. The _major_index_fancy function is defined here. We have nnz = res_indptr[-1], followed by res_indices = np.empty(nnz, dtype=idx_dtype) which is what throws a ValueError.

I believe that it may actually be an overflow error, as discussed in StackOverflow (from 2013): "That overflow causes the variable nnz to become negative. Then the code at the last arrow creates an empty array of size nnz, resulting in a ValueError due to a negative dimension." Scipy has evolved a lot since 2013, and it now supports 64-bit indexing, so I am not sure why this is still a problem.

I solved the ValueError problem by reducing the knn (default: knn = 20) parameter. I am using a batch size of 1000 and a number of nearest neighbors of 10. Given the large size of my dataset, I would like to use Scanorama with a larger number of nearest neighbors (knn=100). I would appreciate any tips on how to solve this problem properly? By properly, I mean by modifying the code rather than playing around with combinations of hyperparameters. Thanks :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"negative dimensions are not allowed" when integrating large dataset #147

"negative dimensions are not allowed" when integrating large dataset #147

aterceros commented Jun 1, 2023

LEMTideman commented Mar 20, 2024 •

edited

Loading

"negative dimensions are not allowed" when integrating large dataset #147

"negative dimensions are not allowed" when integrating large dataset #147

Comments

aterceros commented Jun 1, 2023

LEMTideman commented Mar 20, 2024 • edited Loading

LEMTideman commented Mar 20, 2024 •

edited

Loading