-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10x H5 loading Error #174
Comments
Hi @llumdi ! This is related to #128 and should have been fixed by this recent change in scanpy: scverse/scanpy#2344 I don't know exactly which scanpy versions contain this fix. But I have heard some other people say that the scanpy loader is still giving them this error. If you want a workaround (and the newest scanpy version still gives this error), then I would recommend loading the cellbender output into |
Let me know if you run into a problem trying that out! Happy to discuss more |
Hi @sjfleming, |
I am able to load cellbender output using the function
|
@carmensandoval , I definitely want writing to work! I am betting that the issue lies in
since I think these values are scalars. I think the issue is that we are having trouble reading and writing scalars. So one option would be to just delete the fields and forget about them: for key in ['fraction_data_used_for_testing', 'lambda_multiplier', 'target_false_positive_rate']:
del org_1.uns[key] and then try org_1.write('org_1.h5ad') Or, if you want to keep those fields, you could try import numpy as np
for key in ['fraction_data_used_for_testing', 'lambda_multiplier', 'target_false_positive_rate']:
org_1.uns[key] = np.array(org_1.uns[key]) # wrap it in an array and then try org_1.write('org_1.h5ad') |
I have the same issue with scanpy 1.9.1 and scanpy 1.9.2 also doesn't work with the error below: ---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File /envs/lightning19_scvi10_cuda113_torch113/lib/python3.9/site-packages/scanpy/readwrite.py:288, in _read_v3_10x_h5(filename, start)
277 matrix = csr_matrix(
278 (data, dsets['indices'], dsets['indptr']),
279 shape=(N, M),
280 )
281 adata = AnnData(
282 matrix,
283 obs=dict(obs_names=dsets['barcodes'].astype(str)),
284 var=dict(
285 var_names=dsets['name'].astype(str),
286 gene_ids=dsets['id'].astype(str),
287 feature_types=dsets['feature_type'].astype(str),
--> 288 genome=dsets['genome'].astype(str),
289 ),
290 )
291 logg.info('', time=start)
KeyError: 'genome'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
Cell In[11], line 1
----> 1 adata_cb, ad_cb_list = read_cellbender_output(
2 sc_data_folder,
3 smp_list=sample_list['folder'],
4 metadata=sample_list, type='filtered'
5 )
6 adata_cb.obs['total_counts'] = np.array(adata_cb.X.sum(1)).flatten()
7 # adata = adata[adata.obs['total_counts'] > 1000, :].copy()
8
9 # export aggregated and pre-processed data
Cell In[9], line 34, in read_cellbender_output(file_path, smp_list, metadata, type, umi_filter)
32 ad[sample_name].obs_names = ad[sample_name].obs['barcode']+"_"+ad[sample_name].obs['sample_id']
33 except Exception as e:
---> 34 raise e
35 print(f'{sample_name} {e}')
36 smp_list = smp_list[smp_list != sample_name]
Cell In[9], line 14, in read_cellbender_output(file_path, smp_list, metadata, type, umi_filter)
12 if type in i and 'h5' in i:
13 file = i
---> 14 ad[sample_name] = sc.read_10x_h5(file_path + sample_name +'/'+file)
15 ad[sample_name].var.rename(columns = {'gene_ids':'ENSEMBL'}, inplace = True)
16 ad[sample_name].var['SYMBOL'] = ad[sample_name].var.index
File /envs/lightning19_scvi10_cuda113_torch113/lib/python3.9/site-packages/scanpy/readwrite.py:183, in read_10x_h5(filename, genome, gex_only, backup_url)
181 v3 = '/matrix' in f
182 if v3:
--> 183 adata = _read_v3_10x_h5(filename, start=start)
184 if genome:
185 if genome not in adata.var['genome'].values:
File /envs/lightning19_scvi10_cuda113_torch113/lib/python3.9/site-packages/scanpy/readwrite.py:294, in _read_v3_10x_h5(filename, start)
292 return adata
293 except KeyError:
--> 294 raise Exception('File is missing one or more required datasets.')
Exception: File is missing one or more required datasets. |
@vitkl , that is the first time I'm seeing that one. I'm curious: if you run |
I'll intervene since I did the analysis @vitkl is talking about. I used input in mtx format (since it was multiom and I needed to subset GEX only part and there and it appears that it is much easy to save data in mtx forma than in h5). |
@iaaka It seems like you have found a real problem here, so I appreciate you bringing it to my attention. This is what happened:
I think what I'll do to fix this moving forward is to include a For now, I would say that your best option is to use my data loader function that currently lives here:
You could also use one of the other data loading functions, if you wanted, such as this
which loads the input raw file and the cellbender output file jointly, so that you can have everything in one object. |
I cannot load the cellbender output (v0.2.2) with
sc.read_10x_h5
(scanpy==1.9.1). Cellbender was completed successfully.adata_cellbender = sc.read_10x_h5('./cellbender.h5')
I would really appreaciate a fix. Thanks a lot
The text was updated successfully, but these errors were encountered: