Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with aggregate_duplicate_var: np.max throws TypeError: 'coo_matrix' object is not subscriptable #137

Open
ryan2han opened this issue Dec 28, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@ryan2han
Copy link

Description of the bug

I encountered an issue when using aggregate_duplicate_var on an AnnData object where the aggregation function (aggr_fun) is set to np.max.
While using np.mean works as expected, setting aggr_fun=np.max results in the following error:

TypeError: 'coo_matrix' object is not subscriptable
adata = sc.read_10x_h5(files[0])
UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
  utils.warn_names_duplicates("var")

adata.X
<9132x33670 sparse matrix of type '<class 'numpy.int64'>'
  	with 9701929 stored elements in Compressed Sparse Row format>

def aggregate_duplicate_var(adata, aggr_fun=np.mean):
    retain_var = ~adata.var_names.duplicated(keep="first")
    duplicated_var = adata.var_names[adata.var_names.duplicated()].unique()
    if len(duplicated_var):
        for var in duplicated_var:
            mask = adata.var_names == var
            var_aggr = aggr_fun(adata.X[:, mask], axis=1)[:, np.newaxis]
            adata.X[:, mask] = np.repeat(var_aggr, np.sum(mask), axis=1)

        adata_dedup = adata[:, retain_var].copy()
        return adata_dedup
    else:
        return adata

adata = aggregate_duplicate_var(adata,aggr_fun=np.max)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[24], [line 1](vscode-notebook-cell:?execution_count=24&line=1)
----> [1](vscode-notebook-cell:?execution_count=24&line=1) adata = aggregate_duplicate_var(adata,aggr_fun=np.max)

Cell In[23], [line 7](vscode-notebook-cell:?execution_count=23&line=7)
      [5](vscode-notebook-cell:?execution_count=23&line=5) for var in duplicated_var:
      [6](vscode-notebook-cell:?execution_count=23&line=6)     mask = adata.var_names == var
----> [7](vscode-notebook-cell:?execution_count=23&line=7)     var_aggr = aggr_fun(adata.X[:, mask], axis=1)[:, np.newaxis]
      [8](vscode-notebook-cell:?execution_count=23&line=8)     adata.X[:, mask] = np.repeat(var_aggr, np.sum(mask), axis=1)
     [10](vscode-notebook-cell:?execution_count=23&line=10) adata_dedup = adata[:, retain_var].copy()

TypeError: 'coo_matrix' object is not subscriptable
···


### Command used and terminal output

```console

Relevant files

No response

System information

OS: ubuntu20.04
numpy: 1.23.5
scipy: 1.11.4
scanpy: 1.9.6

@ryan2han ryan2han added the bug Something isn't working label Dec 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant