Error in Scanpy Doublet Analysis with Scrublet on .h5ad Data Using backed="r+" #3370

simang5c · 2024-11-15T10:53:47Z

Please make sure these conditions are met

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of scanpy.
(optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

I'm encountering an issue while running Scrublet for doublet analysis on an .h5ad file loaded with backed="r+" in Scanpy. The operation throws an error, likely due to the limitations of Scrublet working with backed mode, which restricts in-memory data manipulation.

Has anyone faced this issue before? If so, do you know of any workarounds or alternative approaches to run Scrublet on such data without having to fully load it into memory? Any suggestions would be greatly appreciated!

Minimal code sample

#path to file
output_file='/home/test_folder/project1_matrix.h5ad'

#reading the h5ad file which is contains around 1 million cells
#backed="r+" do not allow the adata.obs data to be modified.
adata = sc.read_h5ad(output_file, backed="r+")

sc.pp.scrublet(adata)

Error output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/test_env/lib/python3.12/site-packages/legacy_api_wrap/__init__.py", line 80, in fn_compatible
    return fn(*args_all, **kw)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/test_env/lib/python3.12/site-packages/scanpy/preprocessing/_scrublet/__init__.py", line 180, in scrublet
    adata = adata.copy()
            ^^^^^^^^^^^^
  File "/home/test_env/python3.12/site-packages/anndata/_core/anndata.py", line 1447, in copy
    raise ValueError(
ValueError: To copy an AnnData object in backed mode, pass a filename: `.copy(filename='myfilename.h5ad')`. To load the object into memory, use `.to_memory()`.

Versions

sc.logging.print_versions() 
-----
anndata     0.11.1
scanpy      1.10.4
-----
PIL                 11.0.0
absl                NA
attr                24.2.0
cffi                1.17.1
chex                0.1.87
cycler              0.12.1
cython_runtime      NA
dateutil            2.9.0.post0
distutils           3.12.6
docrep              0.3.2
doubletdetection    4.2
etils               1.10.0
filelock            3.16.1
flax                0.10.1
fsspec              2024.10.0
h5py                3.12.1
igraph              0.11.8
jaraco              NA
jax                 0.4.35
jaxlib              0.4.35
joblib              1.4.2
kiwisolver          1.4.7
lazy_loader         0.4
legacy_api_wrap     NA
leidenalg           0.10.2
lightning           2.4.0
lightning_utilities 0.11.8
llvmlite            0.43.0
louvain             0.8.2
matplotlib          3.9.2
ml_collections      1.0.0
ml_dtypes           0.5.0
more_itertools      10.5.0
mpl_toolkits        NA
mpmath              1.3.0
msgpack             1.1.0
mudata              0.3.1
multipledispatch    0.6.0
natsort             8.4.0
numba               0.60.0
numexpr             2.10.1
numpy               1.26.4
numpyro             0.15.3
nvidia              NA
opt_einsum          3.4.0
optax               0.2.4
packaging           24.2
pandas              2.2.3
phenograph          1.5.7
pkg_resources       NA
platformdirs        4.3.6
psutil              6.1.0
pycparser           2.22
pygments            2.18.0
pynndescent         0.5.13
pyparsing           3.2.0
pyro                1.9.1
pytz                2024.2
rich                NA
scipy               1.14.1
scvi                1.2.0
session_info        1.0.0
setuptools          74.1.2
six                 1.16.0
skimage             0.24.0
sklearn             1.5.2
sparse              0.15.4
sympy               1.13.1
tables              3.10.1
texttable           1.7.0
threadpoolctl       3.5.0
toolz               1.0.0
torch               2.5.1+cu124
torchgen            NA
torchmetrics        1.6.0
tqdm                4.67.0
triton              3.1.0
typing_extensions   NA
wcwidth             0.2.13
xarray              2024.10.0
yaml                6.0.2
zstandard           0.23.0
-----
Python 3.12.6 | packaged by conda-forge | (main, Sep 22 2024, 14:16:49) [GCC 13.3.0]
Linux-6.8.0-48-generic-x86_64-with-glibc2.39
-----
Session information updated at 2024-11-15 11:48

The text was updated successfully, but these errors were encountered:

ilan-gold · 2024-11-26T14:09:27Z

Hello! We do not support backed mode for scrublet! However if you wish to contribute this, we would be more than happy. Alternatively, and probably a more sustainable solution, would be to add dask support (which you can begin to use via https://anndata.readthedocs.io/en/stable/generated/anndata.experimental.read_elem_as_dask.html). I'm going to close because we already have an issue for this: #2578)

simang5c added Bug 🐛 Triage 🩺 This issue needs to be triaged by a maintainer labels Nov 15, 2024

ilan-gold closed this as completed Nov 26, 2024

ilan-gold removed the Triage 🩺 This issue needs to be triaged by a maintainer label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in Scanpy Doublet Analysis with Scrublet on .h5ad Data Using backed="r+" #3370

Error in Scanpy Doublet Analysis with Scrublet on .h5ad Data Using backed="r+" #3370

simang5c commented Nov 15, 2024

ilan-gold commented Nov 26, 2024

Error in Scanpy Doublet Analysis with Scrublet on .h5ad Data Using backed="r+" #3370

Error in Scanpy Doublet Analysis with Scrublet on .h5ad Data Using backed="r+" #3370

Comments

simang5c commented Nov 15, 2024

Please make sure these conditions are met

What happened?

Minimal code sample

Error output

Versions

ilan-gold commented Nov 26, 2024