10x H5 loading Error #174

llumdi · 2022-12-15T18:14:27Z

I cannot load the cellbender output (v0.2.2) with sc.read_10x_h5 (scanpy==1.9.1). Cellbender was completed successfully.

adata_cellbender = sc.read_10x_h5('./cellbender.h5')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [13], line 1
----> 1 temp = sc.read_10x_h5('/projects/site/pred/cit_tdag/projects/other/CRO_scouting_ONCdiscovery/raw_data/cellbender_output_v0.2.0/Nuvisan_Tumor_Brain/cellbender.h5')

File ~/scratch/conda/envs/besca25_LAS/lib/python3.8/site-packages/scanpy/readwrite.py:183, in read_10x_h5(filename, genome, gex_only, backup_url)
    181     v3 = '/matrix' in f
    182 if v3:
--> 183     adata = _read_v3_10x_h5(filename, start=start)
    184     if genome:
    185         if genome not in adata.var['genome'].values:

File ~/scratch/conda/envs/besca25_LAS/lib/python3.8/site-packages/scanpy/readwrite.py:268, in _read_v3_10x_h5(filename, start)
    266 try:
    267     dsets = {}
--> 268     _collect_datasets(dsets, f["matrix"])
    270     from scipy.sparse import csr_matrix
    272     M, N = dsets['shape']

File ~/scratch/conda/envs/besca25_LAS/lib/python3.8/site-packages/scanpy/readwrite.py:256, in _collect_datasets(dsets, group)
    254 for k, v in group.items():
    255     if isinstance(v, h5py.Dataset):
--> 256         dsets[k] = v[:]
    257     else:
    258         _collect_datasets(dsets, v)

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File ~/scratch/conda/envs/besca25_LAS/lib/python3.8/site-packages/h5py/_hl/dataset.py:800, in Dataset.__getitem__(self, args, new_dtype)
    798 if self.shape == ():
    799     fspace = self.id.get_space()
--> 800     selection = sel2.select_read(fspace, args)
    801     if selection.mshape is None:
    802         arr = numpy.zeros((), dtype=new_dtype)

File ~/scratch/conda/envs/besca25_LAS/lib/python3.8/site-packages/h5py/_hl/selections2.py:101, in select_read(fspace, args)
     96 """ Top-level dispatch function for reading.
     97 
     98 At the moment, only supports reading from scalar datasets.
     99 """
    100 if fspace.shape == ():
--> 101     return ScalarReadSelection(fspace, args)
    103 raise NotImplementedError()

File ~/scratch/conda/envs/besca25_LAS/lib/python3.8/site-packages/h5py/_hl/selections2.py:86, in ScalarReadSelection.__init__(self, fspace, args)
     84     self.mshape = ()
     85 else:
---> 86     raise ValueError("Illegal slicing argument for scalar dataspace")
     88 self.mspace = h5s.create(h5s.SCALAR)
     89 self.fspace = fspace

ValueError: Illegal slicing argument for scalar dataspace

I would really appreaciate a fix. Thanks a lot

The text was updated successfully, but these errors were encountered:

sjfleming · 2023-01-11T15:54:37Z

Hi @llumdi ! This is related to #128 and should have been fixed by this recent change in scanpy: scverse/scanpy#2344

I don't know exactly which scanpy versions contain this fix. But I have heard some other people say that the scanpy loader is still giving them this error.

If you want a workaround (and the newest scanpy version still gives this error), then I would recommend loading the cellbender output into anndata format using this custom code:

#128 (comment)

sjfleming · 2023-01-11T15:55:09Z

Let me know if you run into a problem trying that out! Happy to discuss more

llumdi · 2023-01-17T16:09:56Z

Hi @sjfleming,
Thank you so much for the workaround. It worked!

carmensandoval · 2023-02-03T03:14:14Z

I am able to load cellbender output using the function anndata_from_h5 referenced above (thank you!).
However, I can't write these objects to file. Is this expected? Is there a workaround to be able to write?

from anndata_from_h5 import *
org_1 = anndata_from_h5("../cellbender/SAM24425933_cellbender_out_filtered.h5")

Assuming we are loading a "filtered" file that contains only cells.

org_1

AnnData object with n_obs × n_vars = 4341 × 58302
    obs: 'latent_RT_efficiency', 'latent_cell_probability', 'latent_scale'
    var: 'ambient_expression', 'feature_type', 'genome', 'gene_id'
    uns: 'contamination_fraction_params', 'fraction_data_used_for_testing', 'lambda_multiplier', 'overdispersion_mean_and_scale', 'target_false_positive_rate', 'test_elbo', 'test_epoch', 'training_elbo_per_epoch'
    obsm: 'latent_gene_encoding'

sc.write('org_1.h5ad', org_1)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File ~/mambaforge/envs/pegasus/lib/python3.9/site-packages/anndata/_io/utils.py:214, in report_write_key_on_error.<locals>.func_wrapper(elem, key, val, *args, **kwargs)
    213 try:
--> 214     return func(elem, key, val, *args, **kwargs)
    215 except Exception as e:

File ~/mambaforge/envs/pegasus/lib/python3.9/site-packages/anndata/_io/specs/registry.py:175, in write_elem(f, k, elem, modifiers, *args, **kwargs)
    174 else:
--> 175     _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)

File ~/mambaforge/envs/pegasus/lib/python3.9/site-packages/anndata/_io/specs/registry.py:24, in write_spec.<locals>.decorator.<locals>.wrapper(g, k, *args, **kwargs)
     22 @wraps(func)
     23 def wrapper(g, k, *args, **kwargs):
---> 24     result = func(g, k, *args, **kwargs)
     25     g[k].attrs.setdefault("encoding-type", spec.encoding_type)

File ~/mambaforge/envs/pegasus/lib/python3.9/site-packages/anndata/_io/specs/methods.py:307, in write_basic(f, k, elem, dataset_kwargs)
    306 """Write methods which underlying library handles nativley."""
--> 307 f.create_dataset(k, data=elem, **dataset_kwargs)

File ~/mambaforge/envs/pegasus/lib/python3.9/site-packages/h5py/_hl/group.py:161, in Group.create_dataset(self, name, shape, dtype, data, **kwds)
    159         group = self.require_group(parent_path)
--> 161 dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
    162 dset = dataset.Dataset(dsid)
...
    224     ) from e

TypeError: Scalar datasets don't support chunk/filter options

Above error raised while writing key 'fraction_data_used_for_testing' of <class 'h5py._hl.group.Group'> to /

sjfleming · 2023-02-03T15:09:21Z

@carmensandoval , I definitely want writing to work! I am betting that the issue lies in

uns: 'fraction_data_used_for_testing', 'lambda_multiplier', 'target_false_positive_rate'

since I think these values are scalars. I think the issue is that we are having trouble reading and writing scalars.

So one option would be to just delete the fields and forget about them:

for key in ['fraction_data_used_for_testing', 'lambda_multiplier', 'target_false_positive_rate']:
    del org_1.uns[key]

and then try

org_1.write('org_1.h5ad')

Or, if you want to keep those fields, you could try

import numpy as np

for key in ['fraction_data_used_for_testing', 'lambda_multiplier', 'target_false_positive_rate']:
    org_1.uns[key] = np.array(org_1.uns[key])  # wrap it in an array

and then try

org_1.write('org_1.h5ad')

vitkl · 2023-02-24T14:15:51Z

I have the same issue with scanpy 1.9.1 and scanpy 1.9.2 also doesn't work with the error below:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /envs/lightning19_scvi10_cuda113_torch113/lib/python3.9/site-packages/scanpy/readwrite.py:288, in _read_v3_10x_h5(filename, start)
    277 matrix = csr_matrix(
    278     (data, dsets['indices'], dsets['indptr']),
    279     shape=(N, M),
    280 )
    281 adata = AnnData(
    282     matrix,
    283     obs=dict(obs_names=dsets['barcodes'].astype(str)),
    284     var=dict(
    285         var_names=dsets['name'].astype(str),
    286         gene_ids=dsets['id'].astype(str),
    287         feature_types=dsets['feature_type'].astype(str),
--> 288         genome=dsets['genome'].astype(str),
    289     ),
    290 )
    291 logg.info('', time=start)

KeyError: 'genome'

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 adata_cb, ad_cb_list = read_cellbender_output(
      2     sc_data_folder, 
      3     smp_list=sample_list['folder'],
      4     metadata=sample_list, type='filtered'
      5 )
      6 adata_cb.obs['total_counts'] = np.array(adata_cb.X.sum(1)).flatten()
      7 # adata = adata[adata.obs['total_counts'] > 1000, :].copy()
      8 
      9 # export aggregated and pre-processed data

Cell In[9], line 34, in read_cellbender_output(file_path, smp_list, metadata, type, umi_filter)
     32     ad[sample_name].obs_names = ad[sample_name].obs['barcode']+"_"+ad[sample_name].obs['sample_id']
     33 except Exception as e:
---> 34     raise e
     35     print(f'{sample_name} {e}')
     36     smp_list = smp_list[smp_list != sample_name]

Cell In[9], line 14, in read_cellbender_output(file_path, smp_list, metadata, type, umi_filter)
     12     if type in i and 'h5' in i:
     13         file = i
---> 14 ad[sample_name] = sc.read_10x_h5(file_path + sample_name +'/'+file)
     15 ad[sample_name].var.rename(columns = {'gene_ids':'ENSEMBL'}, inplace = True)
     16 ad[sample_name].var['SYMBOL'] = ad[sample_name].var.index

File /envs/lightning19_scvi10_cuda113_torch113/lib/python3.9/site-packages/scanpy/readwrite.py:183, in read_10x_h5(filename, genome, gex_only, backup_url)
    181     v3 = '/matrix' in f
    182 if v3:
--> 183     adata = _read_v3_10x_h5(filename, start=start)
    184     if genome:
    185         if genome not in adata.var['genome'].values:

File /envs/lightning19_scvi10_cuda113_torch113/lib/python3.9/site-packages/scanpy/readwrite.py:294, in _read_v3_10x_h5(filename, start)
    292     return adata
    293 except KeyError:
--> 294     raise Exception('File is missing one or more required datasets.')

Exception: File is missing one or more required datasets.

sjfleming · 2023-02-24T15:03:05Z

@vitkl , that is the first time I'm seeing that one. I'm curious: if you run sc.read_10x_h5() on your raw input file, does that work? If the input file has a genome field, then the output file should also have a genome field.

iaaka · 2023-02-24T15:25:38Z

I'll intervene since I did the analysis @vitkl is talking about. I used input in mtx format (since it was multiom and I needed to subset GEX only part and there and it appears that it is much easy to save data in mtx forma than in h5).
The input mtx can be read with scanpy.read_10x_mtx

sjfleming · 2023-02-24T15:52:09Z

@iaaka It seems like you have found a real problem here, so I appreciate you bringing it to my attention. This is what happened:

The MTX loader did not capture any genome information (I don't think genome is ever included in the MTX format? : https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices)
The h5 output writer did not write any genome information, since none was loaded.
Scanpy expects a genome field in its data loader. While this is kind of annoying, from the standpoint of flexibility, it is also kind of reasonable to expect genome information.

I think what I'll do to fix this moving forward is to include a genome field in the output that just says "NA" (for "not available") or something like that. Or maybe an empty string "". That way scanpy's data loader will work.

For now, I would say that your best option is to use my data loader function that currently lives here:

CellBender/cellbender/remove_background/downstream.py

Line 22 in 6df0d8f

def anndata_from_h5(file: str,

You could also use one of the other data loading functions, if you wanted, such as this

CellBender/cellbender/remove_background/downstream.py

Line 179 in 6df0d8f

def load_anndata_from_input_and_output(input_file: str,

which loads the input raw file and the cellbender output file jointly, so that you can have everything in one object.

sjfleming closed this as completed Jan 17, 2023

carmensandoval mentioned this issue Feb 3, 2023

AttributeError: 'DataDict' object has no attribute 'copy' scverse/anndata#895

Closed

cakirb mentioned this issue Mar 20, 2024

Update nf-cellbender cellgeni/nf-cellbender#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10x H5 loading Error #174

10x H5 loading Error #174

llumdi commented Dec 15, 2022

sjfleming commented Jan 11, 2023

sjfleming commented Jan 11, 2023

llumdi commented Jan 17, 2023

carmensandoval commented Feb 3, 2023

sjfleming commented Feb 3, 2023

vitkl commented Feb 24, 2023

sjfleming commented Feb 24, 2023

iaaka commented Feb 24, 2023 •

edited

Loading

sjfleming commented Feb 24, 2023

10x H5 loading Error #174

10x H5 loading Error #174

Comments

llumdi commented Dec 15, 2022

sjfleming commented Jan 11, 2023

sjfleming commented Jan 11, 2023

llumdi commented Jan 17, 2023

carmensandoval commented Feb 3, 2023

sjfleming commented Feb 3, 2023

vitkl commented Feb 24, 2023

sjfleming commented Feb 24, 2023

iaaka commented Feb 24, 2023 • edited Loading

sjfleming commented Feb 24, 2023

iaaka commented Feb 24, 2023 •

edited

Loading