Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"negative dimensions are not allowed" when integrating large dataset #147

Open
aterceros opened this issue Jun 1, 2023 · 1 comment
Open

Comments

@aterceros
Copy link

Hello,

I've been trying to integrate 8 datasets (quiet large) using the raw values, and also the normalized values, but I keep getting the following error: ValueError: negative dimensions are not allowed. When I subset into 1000 cells per sample scanorma runs without issues, and similarly if only ran with the highly variable genes. Any hints on why this issue happens would be greatly appreciated.
Thank you.

@LEMTideman
Copy link

LEMTideman commented Mar 20, 2024

I had the same problem... I get the following error in the Scanorama transform function: "ValueError: negative dimensions not allowed." The problem is related to the scipy compressed sparse row matrix format. It originates from the _self.major_index_fancy(row) line here. The _major_index_fancy function is defined here. We have nnz = res_indptr[-1], followed by res_indices = np.empty(nnz, dtype=idx_dtype) which is what throws a ValueError.

I believe that it may actually be an overflow error, as discussed in StackOverflow (from 2013): "That overflow causes the variable nnz to become negative. Then the code at the last arrow creates an empty array of size nnz, resulting in a ValueError due to a negative dimension." Scipy has evolved a lot since 2013, and it now supports 64-bit indexing, so I am not sure why this is still a problem.

I solved the ValueError problem by reducing the knn (default: knn = 20) parameter. I am using a batch size of 1000 and a number of nearest neighbors of 10. Given the large size of my dataset, I would like to use Scanorama with a larger number of nearest neighbors (knn=100). I would appreciate any tips on how to solve this problem properly? By properly, I mean by modifying the code rather than playing around with combinations of hyperparameters. Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants