You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to train a large model from a AnnData object; however, memory issues persist when opening the file on our HPC with 512Gb of RAM. naturally, I've attempted to open a stream using the Anndata "backed" parameter and received the error:
> train.py:Line 341
> flag = indata.sum(axis = 0) == 0
> AttributeError: 'Dataset' object has no attribute 'sum'
This error seems reasonable as many of the aggregating functions wouldn't have access to the entire AnnData object. Increasing memory beyond 512Gb for this task is a critical resource limitation. Before attempting to mitigate this by extending the train function to support the backed mode, I'm wondering if there's a solution for processing large scale atlas level datasets with >4 million cells?
Thank you,
The text was updated successfully, but these errors were encountered:
@kennypavan, CellTypist does not support backed mode for the time being. You could load your raw count data for example, normalize+log1p the data, subset into HVGs, write it out as a new anndata, and load it for training. Note you need to use check_expression = False and feature_selection = False for this data during training. In addition, you can also subset cells.
Hello,
I'm attempting to train a large model from a AnnData object; however, memory issues persist when opening the file on our HPC with 512Gb of RAM. naturally, I've attempted to open a stream using the Anndata "backed" parameter and received the error:
This error seems reasonable as many of the aggregating functions wouldn't have access to the entire AnnData object. Increasing memory beyond 512Gb for this task is a critical resource limitation. Before attempting to mitigate this by extending the train function to support the backed mode, I'm wondering if there's a solution for processing large scale atlas level datasets with >4 million cells?
Thank you,
The text was updated successfully, but these errors were encountered: