Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Scanpy Doublet Analysis with Scrublet on .h5ad Data Using backed="r+" #3370

Closed
2 of 3 tasks
simang5c opened this issue Nov 15, 2024 · 1 comment
Closed
2 of 3 tasks
Labels

Comments

@simang5c
Copy link

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

I'm encountering an issue while running Scrublet for doublet analysis on an .h5ad file loaded with backed="r+" in Scanpy. The operation throws an error, likely due to the limitations of Scrublet working with backed mode, which restricts in-memory data manipulation.

Has anyone faced this issue before? If so, do you know of any workarounds or alternative approaches to run Scrublet on such data without having to fully load it into memory? Any suggestions would be greatly appreciated!

Minimal code sample

#path to file
output_file='/home/test_folder/project1_matrix.h5ad'

#reading the h5ad file which is contains around 1 million cells
#backed="r+" do not allow the adata.obs data to be modified.
adata = sc.read_h5ad(output_file, backed="r+")

sc.pp.scrublet(adata)

Error output

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/test_env/lib/python3.12/site-packages/legacy_api_wrap/__init__.py", line 80, in fn_compatible
    return fn(*args_all, **kw)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/test_env/lib/python3.12/site-packages/scanpy/preprocessing/_scrublet/__init__.py", line 180, in scrublet
    adata = adata.copy()
            ^^^^^^^^^^^^
  File "/home/test_env/python3.12/site-packages/anndata/_core/anndata.py", line 1447, in copy
    raise ValueError(
ValueError: To copy an AnnData object in backed mode, pass a filename: `.copy(filename='myfilename.h5ad')`. To load the object into memory, use `.to_memory()`.

Versions

sc.logging.print_versions() 
-----
anndata     0.11.1
scanpy      1.10.4
-----
PIL                 11.0.0
absl                NA
attr                24.2.0
cffi                1.17.1
chex                0.1.87
cycler              0.12.1
cython_runtime      NA
dateutil            2.9.0.post0
distutils           3.12.6
docrep              0.3.2
doubletdetection    4.2
etils               1.10.0
filelock            3.16.1
flax                0.10.1
fsspec              2024.10.0
h5py                3.12.1
igraph              0.11.8
jaraco              NA
jax                 0.4.35
jaxlib              0.4.35
joblib              1.4.2
kiwisolver          1.4.7
lazy_loader         0.4
legacy_api_wrap     NA
leidenalg           0.10.2
lightning           2.4.0
lightning_utilities 0.11.8
llvmlite            0.43.0
louvain             0.8.2
matplotlib          3.9.2
ml_collections      1.0.0
ml_dtypes           0.5.0
more_itertools      10.5.0
mpl_toolkits        NA
mpmath              1.3.0
msgpack             1.1.0
mudata              0.3.1
multipledispatch    0.6.0
natsort             8.4.0
numba               0.60.0
numexpr             2.10.1
numpy               1.26.4
numpyro             0.15.3
nvidia              NA
opt_einsum          3.4.0
optax               0.2.4
packaging           24.2
pandas              2.2.3
phenograph          1.5.7
pkg_resources       NA
platformdirs        4.3.6
psutil              6.1.0
pycparser           2.22
pygments            2.18.0
pynndescent         0.5.13
pyparsing           3.2.0
pyro                1.9.1
pytz                2024.2
rich                NA
scipy               1.14.1
scvi                1.2.0
session_info        1.0.0
setuptools          74.1.2
six                 1.16.0
skimage             0.24.0
sklearn             1.5.2
sparse              0.15.4
sympy               1.13.1
tables              3.10.1
texttable           1.7.0
threadpoolctl       3.5.0
toolz               1.0.0
torch               2.5.1+cu124
torchgen            NA
torchmetrics        1.6.0
tqdm                4.67.0
triton              3.1.0
typing_extensions   NA
wcwidth             0.2.13
xarray              2024.10.0
yaml                6.0.2
zstandard           0.23.0
-----
Python 3.12.6 | packaged by conda-forge | (main, Sep 22 2024, 14:16:49) [GCC 13.3.0]
Linux-6.8.0-48-generic-x86_64-with-glibc2.39
-----
Session information updated at 2024-11-15 11:48
@simang5c simang5c added Bug 🐛 Triage 🩺 This issue needs to be triaged by a maintainer labels Nov 15, 2024
@ilan-gold
Copy link
Contributor

Hello! We do not support backed mode for scrublet! However if you wish to contribute this, we would be more than happy. Alternatively, and probably a more sustainable solution, would be to add dask support (which you can begin to use via https://anndata.readthedocs.io/en/stable/generated/anndata.experimental.read_elem_as_dask.html). I'm going to close because we already have an issue for this: #2578)

@ilan-gold ilan-gold removed the Triage 🩺 This issue needs to be triaged by a maintainer label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants