Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error that occurs when the number of cells is large #246

Closed
kokitsuyuzaki opened this issue Sep 21, 2023 · 7 comments
Closed

Error that occurs when the number of cells is large #246

kokitsuyuzaki opened this issue Sep 21, 2023 · 7 comments

Comments

@kokitsuyuzaki
Copy link

Hi,

I tried to use kana explorer mode with the exact same data that was used in a previous Issue (#200 (comment)) but it is not working now with the latest version of kana.

The error message looks the same with that of the previous Issue (#200 (comment)).

スクリーンショット 2023-09-21 17 51 59
スクリーンショット 2023-09-21 17 52 37
スクリーンショット 2023-09-21 17 53 03

The size of my data matrix is 20359 genes × 82552 cells and when I reduce the number of cells to 40000 cells, kana worked so I'm pretty sure this error is related to data size.
Has there been any specification change from previous versions of kana, such as setting a limit on the data size?
If possible, I would like to be able to use this size of data in kana as it was before.

@LTLA
Copy link
Collaborator

LTLA commented Sep 21, 2023

Hmm. The only relevant changes for memory usage are from the yet-to-be-merged #239, and I don't recall any other major changes to the internals between now and your comments in #200.

Is this literally the same RDS file as in #200?

Are you in explore mode?

@LTLA
Copy link
Collaborator

LTLA commented Sep 21, 2023

Another thought: it would help if you can give us an "anonymized" dataset with the same properties as your actual data, which can reproduce the error. For example, replace the row names with "Gene_XYZ", strip out other identifying information in the row/column data, remove metadata, scramble the rows, etc.. This should make it a bit easier for us to debug without compromising privacy.

@kokitsuyuzaki
Copy link
Author

Thanks, @LTLA
I cannot provide our data here, so I'll directly send the data to your e-mail address associated with your GitHub account.

@LTLA
Copy link
Collaborator

LTLA commented Sep 22, 2023

Thanks @kokitsuyuzaki. After some investigation, it seems that the recent failure is caused by kana being more aggressive at finding data for other modalities (e.g., ADTs, CRISPR). In this case, we assume that the alternative experiments of the SCE contain data for other modalities, and try to use load the corresponding assay.

However, the only alternative experiment here is the integrated experiment, which does not seem to contain any relevant multi-modal data. In fact, not only does it not contain multi-modal data, its assay data is not actually sparse and uses more memory (1.7 GB) than the main experiment's assay (~1GB), despite containing 10-fold fewer features.

I suspect that previous versions of kana were less aggressive about using the alternative experiments, and thus this issue was never encountered. The new behavior is generally useful for people with ADT/CRISPR data in their SCE alternative experiments, which is a fairly common use case, so that's why we made the change.

In theory, the solution would be for you to just deselect the integrated assay before clicking "explore". Unfortunately, the UI change doesn't actually seem to have any effect at the moment, so @jkanche or I will have a closer look tomorrow. We will also think about being more careful about the default selection of "ADT"-ish or "CRISPR"-ish names.

In the meantime, I think you could try removing the alternative experiment from the SCE and try again, just to check.

@kokitsuyuzaki
Copy link
Author

Thanks for your advice.
kana is working now!

integrated is a slot that is automatically created in the Seurat object when the following calculation is performed in Seurat.
https://satijalab.org/seurat/articles/integration_introduction.html
This feature is intended to integrate multiple batches of scRNA-Seq data, rather than multi-modal data.

I have been converting the Seurat object above to a SingleCellExperiment and using it for kana as follows:

library("Seurat")
library("SingleCellExperiment")

load("seurat.obj.RData")
sce  <- as.SingleCellExperiment(seurat.obj)
counts(sce) <- NULL
altExp(sce) <- NULL

This time I did the last altExp(sce) <- NULL twice, somehow kana became available now.
I confirmed that the first time altExp(sce) <- NULL removed the counts assay, and then the second time altExp(sce) <- NULL removed the logcounts assay.
Even if the logcounts in altExp was removed, logcounts(sce) did work.

@kokitsuyuzaki
Copy link
Author

kokitsuyuzaki commented Sep 22, 2023

In my case, the above approach is enough for me.
I don't know whether kana should address all the various ways of analyzing data in Seurat.

I know Seurat has been implemented in a way independent of Bioconductor as you mentioned before (#156 (comment)) and kana might be in trouble by the sudden specification changes of Seurat.

@jkanche
Copy link
Collaborator

jkanche commented Jan 30, 2024

Hi @kokitsuyuzaki , let us know if you continue to notice any errors with this issue.

@jkanche jkanche closed this as completed Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants