Cannot compute joint embedding on RNA and ATAC object #350

thecatsmilk · 2024-10-09T15:55:34Z

I am following this tutorial to jointly embed my single-nucelus multiome capture from 10X. I have already filtered both RNA and ATAC objects to the same cells, and made sure that the barcodes are identical, then I process with the embedding:

embedding = snap.tl.multi_spectral(
    adatas = [rna, atac],
    features = 'selected'
)[1]

Once running the above cell I am met with this error:

2024-10-09 15:44:29 - INFO - Compute normalized views...
thread '<unnamed>' panicked at src/embedding.rs:411:13:
called `Result::unwrap()` on an `Err` value: Cannot convert Array(F32) to f64 CsrMatrix

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[35], line 1
----> 1 embedding = snap.tl.multi_spectral(
      2     adatas = [rna, atac],
      3     features = 'selected'
      4 )[1]

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/snapatac2/tools/_embedding.py:533, in multi_spectral(adatas, n_comps, features, weights, random_state, weighted_by_sd)
    530 if weights is None:
    531     weights = [1.0 for _ in adatas]
--> 533 evals, evecs = internal.multi_spectral_embedding(adatas, features, weights, n_comps, random_state)
    535 if weighted_by_sd:
    536     idx = [i for i in range(evals.shape[0]) if evals[i] > 0]

PanicException: called `Result::unwrap()` on an `Err` value: Cannot convert Array(F32) to f64 CsrMatrix

It seems like the issue is that atac.X is a PyArray Element. Keep in mind atac is an AnnDataSet object that I processed with SnapATAC2 and then converted to an AnnData object in order to co-embed these ATAC data with the RNA data from the same capture.

I tried many ways to convert the PyArray to a Numpy array in order to have it be compatible, but I was unsuccessful. It seems as if I could do this if I went into Rust to modulate this PyArray, but I have no experience in Rust. Any help with this would be greatly appreciate it. Thank you!

scanpy==1.10.3
snapatac2==2.7.0
polars==1.6.0
numpy==1.26.4
pandas==2.1.1
h5py==3.11.0
anndata==0.10.8
matplotlib==3.9.2
seaborn==0.13.2
scipy==1.14.1

The text was updated successfully, but these errors were encountered:

kaizhang · 2024-10-10T02:02:38Z

You need to convert numpy array to scipy sparse matrix. You can do that by adata.X = csr_matrix(adata.X[:]).

thecatsmilk · 2024-10-10T15:49:23Z

@kaizhang

Thanks for the reply! I did discover that you can extract the matrix how you show above (admittedly by accident). I just tried this:

# convert to scipy sparse matrix
atac.X = sp.csr_matrix(atac.X[:])

# perform joint embedding
embedding = snap.tl.multi_spectral(
    adatas=[rna, atac],
    features=None
)[1]

Unfortunately I am still met with the same error:

2024-10-10 15:36:24 - INFO - Compute normalized views...
thread '<unnamed>' panicked at src/embedding.rs:411:13:
called `Result::unwrap()` on an `Err` value: Cannot convert Array(F64) to f64 CsrMatrix

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[32], line 1
----> 1 embedding = snap.tl.multi_spectral(
      2     adatas=[rna, atac],
      3     features=None
      4 )[1]

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/snapatac2/tools/_embedding.py:533, in multi_spectral(adatas, n_comps, features, weights, random_state, weighted_by_sd)
    530 if weights is None:
    531     weights = [1.0 for _ in adatas]
--> 533 evals, evecs = internal.multi_spectral_embedding(adatas, features, weights, n_comps, random_state)
    535 if weighted_by_sd:
    536     idx = [i for i in range(evals.shape[0]) if evals[i] > 0]

PanicException: called `Result::unwrap()` on an `Err` value: Cannot convert Array(F64) to f64 CsrMatrix

What I did next was basically build a new AnnData object after extracting all the elements, like so:

# rebuild AnnData
obs_df = pd.DataFrame({
    'sample': atac.obs['sample'],
    'leiden_0.05': atac.obs['leiden_0.05'],
    'leiden_0.1': atac.obs['leiden_0.1'],
    'leiden_0.2': atac.obs['leiden_0.2'],
    'leiden_0.35': atac.obs['leiden_0.35'],
    'leiden_0.4': atac.obs['leiden_0.4'],
    'leiden_0.6': atac.obs['leiden_0.6'],
    'leiden_0.8': atac.obs['leiden_0.8'],
    'leiden_1.0': atac.obs['leiden_1.0'],
    'leiden_1.2': atac.obs['leiden_1.2'],
    'leiden_1.4': atac.obs['leiden_1.4'],
    'ATAC': atac.obs['ATAC'],
    'cellType': atac.obs['cellType'],
    'barcodes_clean': atac.obs['barcodes_clean'],
    'barcodes_orig': atac.obs['barcodes_orig'],
    'RNA': atac.obs['RNA']
})
obs_df.index = atac.obs_names

var_df = pd.DataFrame({
    'count': atac.var['count'],
    'selected': atac.var['selected']
})
var_df.index = atac.var_names

# build AnnData
new_atac = ad.AnnData(X=X_f64, obs=obs_df, var=var_df)
new_atac

Where X_f64 was created like this:

from scipy import sparse as sp
X = atac.X[:]
X = sp.csr_matrix(X)
X_f64 = X.astype('float64')

With new_atac I tried to co-embedding again:

embedding = snap.tl.multi_spectral(
    adatas = [rna, new_atac],
    features = rna.var['selected']
)[1]

It worked! I move to re-running UMAP:

new_atac.obsm['X_joint'] = embedding
snap.tl.umap(new_atac, use_rep='X_joint')

Unfortunately I am met with this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[136], line 2
      1 new_atac.obsm['X_joint'] = embedding
----> 2 snap.tl.umap(new_atac, use_rep='X_joint')

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/snapatac2/tools/_embedding.py:113, in umap(adata, n_comps, use_dims, use_rep, key_added, random_state, inplace, **kwargs)
    108 if use_dims is not None:
    109     data = data[:, :use_dims] if isinstance(use_dims, int) else data[:, use_dims]
    111 umap = UMAP(random_state=random_state,
    112             n_components=n_comps,
--> 113             **kwargs).fit_transform(data)
    114 if inplace:
    115     adata.obsm["X_" + key_added] = umap

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/umap/umap_.py:2891, in UMAP.fit_transform(self, X, y, force_all_finite)
   2855 def fit_transform(self, X, y=None, force_all_finite=True):
   2856     """Fit X into an embedded space and return that transformed
   2857     output.
   2858 
   (...)
   2889         Local radii of data points in the embedding (log-transformed).
   2890     """
-> 2891     self.fit(X, y, force_all_finite)
   2892     if self.transform_mode == "embedding":
   2893         if self.output_dens:

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/umap/umap_.py:2358, in UMAP.fit(self, X, y, force_all_finite)
   2356     X = check_array(X, dtype=np.uint8, order="C", force_all_finite=force_all_finite)
   2357 else:
-> 2358     X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C", force_all_finite=force_all_finite)
   2359 self._raw_data = X
   2361 # Handle all the optional arguments, setting default

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/sklearn/utils/validation.py:1096, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_writeable, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
   1094     n_features = array.shape[1]
   1095     if n_features < ensure_min_features:
-> 1096         raise ValueError(
   1097             "Found array with %d feature(s) (shape=%s) while"
   1098             " a minimum of %d is required%s."
   1099             % (n_features, array.shape, ensure_min_features, context)
   1100         )
   1102 if force_writeable:
   1103     # By default, array.copy() creates a C-ordered copy. We set order=K to
   1104     # preserve the order of the array.
   1105     copy_params = {"order": "K"} if not sp.issparse(array) else {}

ValueError: Found array with 0 feature(s) (shape=(6333, 0)) while a minimum of 1 is required.

Then—under closer inspection—I realized embedding has no features:

embedding
array([], shape=(6333, 0), dtype=float64)

Any ideas on how to proceed from here?

kaizhang · 2024-10-15T08:36:47Z

I think it is the 'rna' matrix that causes the problem.

skpeterson · 2024-11-12T14:10:07Z

When analyzing multi-modal data, it seems like the RNA expression matrix does not also transfer, only the UMAP embeddings? Is this true? How do we connect the expression to the differentially accesible regions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot compute joint embedding on RNA and ATAC object #350

Cannot compute joint embedding on RNA and ATAC object #350

thecatsmilk commented Oct 9, 2024

kaizhang commented Oct 10, 2024

thecatsmilk commented Oct 10, 2024 •

edited

Loading

kaizhang commented Oct 15, 2024

skpeterson commented Nov 12, 2024

Cannot compute joint embedding on RNA and ATAC object #350

Cannot compute joint embedding on RNA and ATAC object #350

Comments

thecatsmilk commented Oct 9, 2024

kaizhang commented Oct 10, 2024

thecatsmilk commented Oct 10, 2024 • edited Loading

kaizhang commented Oct 15, 2024

skpeterson commented Nov 12, 2024

thecatsmilk commented Oct 10, 2024 •

edited

Loading