Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnnData Backed Mode Support #131

Open
kennypavan opened this issue Aug 27, 2024 · 2 comments
Open

AnnData Backed Mode Support #131

kennypavan opened this issue Aug 27, 2024 · 2 comments

Comments

@kennypavan
Copy link

Hello,

I'm attempting to train a large model from a AnnData object; however, memory issues persist when opening the file on our HPC with 512Gb of RAM. naturally, I've attempted to open a stream using the Anndata "backed" parameter and received the error:

> train.py:Line 341 
> flag = indata.sum(axis = 0) == 0
> AttributeError: 'Dataset' object has no attribute 'sum'

This error seems reasonable as many of the aggregating functions wouldn't have access to the entire AnnData object. Increasing memory beyond 512Gb for this task is a critical resource limitation. Before attempting to mitigate this by extending the train function to support the backed mode, I'm wondering if there's a solution for processing large scale atlas level datasets with >4 million cells?

Thank you,

@ChuanXu1
Copy link
Collaborator

ChuanXu1 commented Sep 4, 2024

@kennypavan, CellTypist does not support backed mode for the time being. You could load your raw count data for example, normalize+log1p the data, subset into HVGs, write it out as a new anndata, and load it for training. Note you need to use check_expression = False and feature_selection = False for this data during training. In addition, you can also subset cells.

@kennypavan
Copy link
Author

@ChuanXu1 Thank you for the suggestions—I'll explore if preprocessing and removing non-HVGs will work for our use case. Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@kennypavan @ChuanXu1 and others