Categorical keys are not imported when converting from H5ad to Zarr #116

rpadmanabhan · 2024-06-18T17:13:08Z

Describe the bug

When trying to convert an h5ad file to zarr format using scarf I find that categorical keys are ignored.

To Reproduce

>>> reader = scarf.H5adReader("/media/rpadmanabhan/raghav_data/luca_modified.h5ad")
>>> [e[0] for e in reader.get_cell_columns()]
Reading attributes from group obs:   0%|                                                                                                                                                                                                                                                                                                                                                                                                                                                    Reading attributes from group obs:  62%| █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                                                                                                                                                 Reading attributes from group obs: 100%| █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 52/52 [00:00]
Reading attributes from group obsm:   0%|                                                                                                                                                                                                                                                                                                                                                                                                                                                   Reading attributes from group obsm:  33%| ████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                                                                                                                                                                                             Reading attributes from group obsm:  67%| █████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                                                                                                        Reading attributes from group obsm: 100%| ██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 3/3 [00:00]
['age', 'is_primary_data', 'n_genes_by_counts', 'observation_joinid', 'pct_counts_mt', 'total_counts', 'total_counts_mito', 'X_scANVI1', 'X_scANVI2', 'X_scANVI3', 'X_scANVI4', 'X_scANVI5', 'X_scANVI6', 'X_scANVI7', 'X_scANVI8', 'X_scANVI9', 'X_scANVI10', 'X_scVI1', 'X_scVI2', 'X_scVI3', 'X_scVI4', 'X_scVI5', 'X_scVI6', 'X_scVI7', 'X_scVI8', 'X_scVI9', 'X_scVI10', 'X_umap1', 'X_umap2']

adata = anndata.read_h5ad("/media/rpadmanabhan/raghav_data/luca_modified.h5ad", backed="r")
>>> adata.obs.keys()
Index(['sample', 'uicc_stage', 'ever_smoker', 'age', 'donor_id', 'origin',
       'dataset', 'ann_fine', 'Cell_Type_Experimental', 'doublet_status',
       'leiden', 'n_genes_by_counts', 'total_counts', 'total_counts_mito',
       'pct_counts_mt', 'ann_coarse', 'cell_type_tumor', 'tumor_stage',
       'EGFR_mutation', 'TP53_mutation', 'ALK_mutation', 'BRAF_mutation',
       'ERBB2_mutation', 'KRAS_mutation', 'ROS_mutation', 'origin_fine',
       'study', 'platform', 'Sample_Tag', 'cell_type_neutro',
       'cell_type_neutro_coarse', 'suspension_type', 'assay_ontology_term_id',
       'cell_type_ontology_term_id', 'development_stage_ontology_term_id',
       'disease_ontology_term_id', 'self_reported_ethnicity_ontology_term_id',
       'is_primary_data', 'organism_ontology_term_id', 'sex_ontology_term_id',
       'tissue_ontology_term_id', 'tissue_type', 'cell_type',
       'Sample_Tag_Name', 'disease', 'organism', 'sex', 'tissue',
       'self_reported_ethnicity', 'development_stage', 'observation_joinid'],
      dtype='object')
>>> adata.obs['age']
001C_AAACCTGCATCGGGTC-0    22.0
001C_AAACCTGTCAACACCA-0    22.0
001C_AAACGGGAGACTAAGT-0    22.0
001C_AAACGGGAGGCTCATT-0    22.0
001C_AAACGGGAGGGAACGG-0    22.0
                           ... 
TTTGTCACATCTATGG-1-38-8    64.0
TTTGTCACATGTTGAC-1-38-8    64.0
TTTGTCAGTGTTGGGA-1-38-8    64.0
TTTGTCATCAGTTTGG-1-38-8    64.0
TTTGTCATCTCGGACG-1-38-8    64.0
Name: age, Length: 1283972, dtype: float64
>>> adata.obs['cell_type']
001C_AAACCTGCATCGGGTC-0                  non-classical monocyte
001C_AAACCTGTCAACACCA-0                     alveolar macrophage
001C_AAACGGGAGACTAAGT-0    endothelial cell of lymphatic vessel
001C_AAACGGGAGGCTCATT-0                              macrophage
001C_AAACGGGAGGGAACGG-0                      classical monocyte
                                           ...                 
TTTGTCACATCTATGG-1-38-8                              macrophage
TTTGTCACATGTTGAC-1-38-8                      classical monocyte
TTTGTCAGTGTTGGGA-1-38-8                              macrophage
TTTGTCATCAGTTTGG-1-38-8                       regulatory T cell
TTTGTCATCTCGGACG-1-38-8                      classical monocyte
Name: cell_type, Length: 1283972, dtype: category
Categories (33, object): ['mesothelial cell', 'epithelial cell of lung', 'mast cell', 'club cell', ...,
                          'fibroblast of lung', 'multi-ciliated epithelial cell',
                          'pulmonary artery endothelial cell', 'bronchus fibroblast of lung']

Expected behavior
Should import all fields under obs

Scarf and Python version
Scarf: 0.28.9 and Python 3.10.12

The text was updated successfully, but these errors were encountered:

to-be-so-lonely · 2024-09-25T07:54:23Z

Although I'm not one of the original developers, I resolved this issue as follows - if you want to import the "leiden" column for example:

reader = scarf.H5adReader("/path/to/original/h5ad/file", cell_ids_key = "index", feature_ids_key = "gene_ids", feature_name_key = "gene_names")
writer = scarf.H5adToZarr(reader, zarr_loc = "path/to/zarr/file", assay_name = "RNA")
writer.dump()

ds = scarf.DataStore("path/to/zarr/file", nthreads = nthreads)

adata = sc.read_h5ad("path/to/original/h5ad/file")
leiden = adata.obs["leiden"]

ds.cells.insert(column_name = "leiden", values = leiden)

You can create a for loop if you want to import all the columns under obs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Categorical keys are not imported when converting from H5ad to Zarr #116

Categorical keys are not imported when converting from H5ad to Zarr #116

rpadmanabhan commented Jun 18, 2024 •

edited

Loading

to-be-so-lonely commented Sep 25, 2024 •

edited

Loading

Categorical keys are not imported when converting from H5ad to Zarr #116

Categorical keys are not imported when converting from H5ad to Zarr #116

Comments

rpadmanabhan commented Jun 18, 2024 • edited Loading

to-be-so-lonely commented Sep 25, 2024 • edited Loading

rpadmanabhan commented Jun 18, 2024 •

edited

Loading

to-be-so-lonely commented Sep 25, 2024 •

edited

Loading