Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot compute joint embedding on RNA and ATAC object #350

Open
thecatsmilk opened this issue Oct 9, 2024 · 4 comments
Open

Cannot compute joint embedding on RNA and ATAC object #350

thecatsmilk opened this issue Oct 9, 2024 · 4 comments

Comments

@thecatsmilk
Copy link

I am following this tutorial to jointly embed my single-nucelus multiome capture from 10X. I have already filtered both RNA and ATAC objects to the same cells, and made sure that the barcodes are identical, then I process with the embedding:

embedding = snap.tl.multi_spectral(
    adatas = [rna, atac],
    features = 'selected'
)[1]

Once running the above cell I am met with this error:

2024-10-09 15:44:29 - INFO - Compute normalized views...
thread '<unnamed>' panicked at src/embedding.rs:411:13:
called `Result::unwrap()` on an `Err` value: Cannot convert Array(F32) to f64 CsrMatrix

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[35], line 1
----> 1 embedding = snap.tl.multi_spectral(
      2     adatas = [rna, atac],
      3     features = 'selected'
      4 )[1]

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/snapatac2/tools/_embedding.py:533, in multi_spectral(adatas, n_comps, features, weights, random_state, weighted_by_sd)
    530 if weights is None:
    531     weights = [1.0 for _ in adatas]
--> 533 evals, evecs = internal.multi_spectral_embedding(adatas, features, weights, n_comps, random_state)
    535 if weighted_by_sd:
    536     idx = [i for i in range(evals.shape[0]) if evals[i] > 0]

PanicException: called `Result::unwrap()` on an `Err` value: Cannot convert Array(F32) to f64 CsrMatrix

It seems like the issue is that atac.X is a PyArray Element. Keep in mind atac is an AnnDataSet object that I processed with SnapATAC2 and then converted to an AnnData object in order to co-embed these ATAC data with the RNA data from the same capture.

I tried many ways to convert the PyArray to a Numpy array in order to have it be compatible, but I was unsuccessful. It seems as if I could do this if I went into Rust to modulate this PyArray, but I have no experience in Rust. Any help with this would be greatly appreciate it. Thank you!

scanpy==1.10.3
snapatac2==2.7.0
polars==1.6.0
numpy==1.26.4
pandas==2.1.1
h5py==3.11.0
anndata==0.10.8
matplotlib==3.9.2
seaborn==0.13.2
scipy==1.14.1
@kaizhang
Copy link
Owner

You need to convert numpy array to scipy sparse matrix. You can do that by adata.X = csr_matrix(adata.X[:]).

@thecatsmilk
Copy link
Author

thecatsmilk commented Oct 10, 2024

@kaizhang

Thanks for the reply! I did discover that you can extract the matrix how you show above (admittedly by accident). I just tried this:

# convert to scipy sparse matrix
atac.X = sp.csr_matrix(atac.X[:])

# perform joint embedding
embedding = snap.tl.multi_spectral(
    adatas=[rna, atac],
    features=None
)[1]

Unfortunately I am still met with the same error:

2024-10-10 15:36:24 - INFO - Compute normalized views...
thread '<unnamed>' panicked at src/embedding.rs:411:13:
called `Result::unwrap()` on an `Err` value: Cannot convert Array(F64) to f64 CsrMatrix

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[32], line 1
----> 1 embedding = snap.tl.multi_spectral(
      2     adatas=[rna, atac],
      3     features=None
      4 )[1]

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/snapatac2/tools/_embedding.py:533, in multi_spectral(adatas, n_comps, features, weights, random_state, weighted_by_sd)
    530 if weights is None:
    531     weights = [1.0 for _ in adatas]
--> 533 evals, evecs = internal.multi_spectral_embedding(adatas, features, weights, n_comps, random_state)
    535 if weighted_by_sd:
    536     idx = [i for i in range(evals.shape[0]) if evals[i] > 0]

PanicException: called `Result::unwrap()` on an `Err` value: Cannot convert Array(F64) to f64 CsrMatrix

What I did next was basically build a new AnnData object after extracting all the elements, like so:

# rebuild AnnData
obs_df = pd.DataFrame({
    'sample': atac.obs['sample'],
    'leiden_0.05': atac.obs['leiden_0.05'],
    'leiden_0.1': atac.obs['leiden_0.1'],
    'leiden_0.2': atac.obs['leiden_0.2'],
    'leiden_0.35': atac.obs['leiden_0.35'],
    'leiden_0.4': atac.obs['leiden_0.4'],
    'leiden_0.6': atac.obs['leiden_0.6'],
    'leiden_0.8': atac.obs['leiden_0.8'],
    'leiden_1.0': atac.obs['leiden_1.0'],
    'leiden_1.2': atac.obs['leiden_1.2'],
    'leiden_1.4': atac.obs['leiden_1.4'],
    'ATAC': atac.obs['ATAC'],
    'cellType': atac.obs['cellType'],
    'barcodes_clean': atac.obs['barcodes_clean'],
    'barcodes_orig': atac.obs['barcodes_orig'],
    'RNA': atac.obs['RNA']
})
obs_df.index = atac.obs_names

var_df = pd.DataFrame({
    'count': atac.var['count'],
    'selected': atac.var['selected']
})
var_df.index = atac.var_names

# build AnnData
new_atac = ad.AnnData(X=X_f64, obs=obs_df, var=var_df)
new_atac

Where X_f64 was created like this:

from scipy import sparse as sp
X = atac.X[:]
X = sp.csr_matrix(X)
X_f64 = X.astype('float64')

With new_atac I tried to co-embedding again:

embedding = snap.tl.multi_spectral(
    adatas = [rna, new_atac],
    features = rna.var['selected']
)[1]

It worked! I move to re-running UMAP:

new_atac.obsm['X_joint'] = embedding
snap.tl.umap(new_atac, use_rep='X_joint')

Unfortunately I am met with this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[136], line 2
      1 new_atac.obsm['X_joint'] = embedding
----> 2 snap.tl.umap(new_atac, use_rep='X_joint')

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/snapatac2/tools/_embedding.py:113, in umap(adata, n_comps, use_dims, use_rep, key_added, random_state, inplace, **kwargs)
    108 if use_dims is not None:
    109     data = data[:, :use_dims] if isinstance(use_dims, int) else data[:, use_dims]
    111 umap = UMAP(random_state=random_state,
    112             n_components=n_comps,
--> 113             **kwargs).fit_transform(data)
    114 if inplace:
    115     adata.obsm["X_" + key_added] = umap

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/umap/umap_.py:2891, in UMAP.fit_transform(self, X, y, force_all_finite)
   2855 def fit_transform(self, X, y=None, force_all_finite=True):
   2856     """Fit X into an embedded space and return that transformed
   2857     output.
   2858 
   (...)
   2889         Local radii of data points in the embedding (log-transformed).
   2890     """
-> 2891     self.fit(X, y, force_all_finite)
   2892     if self.transform_mode == "embedding":
   2893         if self.output_dens:

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/umap/umap_.py:2358, in UMAP.fit(self, X, y, force_all_finite)
   2356     X = check_array(X, dtype=np.uint8, order="C", force_all_finite=force_all_finite)
   2357 else:
-> 2358     X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C", force_all_finite=force_all_finite)
   2359 self._raw_data = X
   2361 # Handle all the optional arguments, setting default

File /home/.conda/envs/scSnapATAC2_v2024.08/lib/python3.12/site-packages/sklearn/utils/validation.py:1096, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_writeable, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
   1094     n_features = array.shape[1]
   1095     if n_features < ensure_min_features:
-> 1096         raise ValueError(
   1097             "Found array with %d feature(s) (shape=%s) while"
   1098             " a minimum of %d is required%s."
   1099             % (n_features, array.shape, ensure_min_features, context)
   1100         )
   1102 if force_writeable:
   1103     # By default, array.copy() creates a C-ordered copy. We set order=K to
   1104     # preserve the order of the array.
   1105     copy_params = {"order": "K"} if not sp.issparse(array) else {}

ValueError: Found array with 0 feature(s) (shape=(6333, 0)) while a minimum of 1 is required.

Then—under closer inspection—I realized embedding has no features:

embedding
array([], shape=(6333, 0), dtype=float64)

Any ideas on how to proceed from here?

@kaizhang
Copy link
Owner

I think it is the 'rna' matrix that causes the problem.

@skpeterson
Copy link

When analyzing multi-modal data, it seems like the RNA expression matrix does not also transfer, only the UMAP embeddings? Is this true? How do we connect the expression to the differentially accesible regions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants