Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scanorama does not subset to HVG list #382

Open
mbuttner opened this issue Apr 27, 2023 · 1 comment
Open

scanorama does not subset to HVG list #382

mbuttner opened this issue Apr 27, 2023 · 1 comment

Comments

@mbuttner
Copy link
Contributor

Hi there,

if I incorporate an HVG list in scanorama, the list is ignored and scanorama is run on all genes.

I defined an HVG list as follows (pancreatic islet dataset as example, downloaded from figshare):

hvg_list = list(adata.var_names[adata.var['highly_variable']])
print(hvg_list[:10])

Output:

['A1CF',
 'A2M',
 'AADAC',
 'ABAT',
 'ABCA1',
 'ABCB1',
 'ABCC3',
 'ABCC8',
 'ABCG1',
 'ABCG2']
adata_scanorama = scib.integration.scanorama(adata = adata, batch = batch_key, hvg = hvg_list)

e.g.

/home/marenbuettner/miniconda3/envs/dataint_r/lib/python3.8/site-packages/scanorama/scanorama.py:378: DeprecationWarning: Please use `csr_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.csr` namespace is deprecated.
  elif issubclass(type(ds), scipy.sparse.csr.csr_matrix):
Found 19093 genes among all datasets

Here are my versions:

-----
anndata     0.8.0
scanpy      1.9.1
-----
PIL                         9.4.0
anndata2ri                  1.2.dev11
anyio                       NA
asttokens                   NA
attr                        22.2.0
babel                       2.11.0
backcall                    0.2.0
backports                   NA
brotli                      NA
certifi                     2022.12.07
cffi                        1.15.1
charset_normalizer          3.1.0
colorama                    0.4.6
comm                        0.1.2
cycler                      0.10.0
cython_runtime              NA
dateutil                    2.8.2
debugpy                     1.6.6
decorator                   5.1.1
defusedxml                  0.7.1
deprecated                  1.2.13
executing                   1.2.0
fastjsonschema              NA
google                      NA
h5py                        3.8.0
idna                        3.4
igraph                      0.10.4
importlib_resources         NA
ipykernel                   6.21.1
ipython_genutils            0.2.0
ipywidgets                  8.0.6
jedi                        0.18.2
jinja2                      3.1.2
joblib                      1.2.0
json5                       NA
jsonschema                  4.17.3
jupyter_events              0.6.3
jupyter_server              2.2.1
jupyterlab_server           2.19.0
kiwisolver                  1.4.4
leidenalg                   0.9.1
llvmlite                    0.39.1
louvain                     0.8.0
markupsafe                  2.1.2
matplotlib                  3.6.3
matplotlib_inline           0.1.6
mpl_toolkits                NA
natsort                     8.2.0
nbformat                    5.7.3
numba                       0.56.4
numexpr                     2.8.4
numpy                       1.23.5
packaging                   23.0
pandas                      1.5.3
parso                       0.8.3
pexpect                     4.8.0
pickleshare                 0.7.5
pkg_resources               NA
platformdirs                3.0.0
prometheus_client           NA
prompt_toolkit              3.0.36
psutil                      5.9.4
ptyprocess                  0.7.0
pure_eval                   0.2.2
pvectorc                    NA
pycparser                   2.21
pydev_ipython               NA
pydevconsole                NA
pydevd                      2.9.5
pydevd_file_utils           NA
pydevd_plugins              NA
pydevd_tracing              NA
pygments                    2.14.0
pynndescent                 0.5.8
pyparsing                   3.0.9
pyrsistent                  NA
pythonjsonlogger            NA
pytz                        2022.7.1
pytz_deprecation_shim       NA
requests                    2.29.0
rfc3339_validator           0.1.4
rfc3986_validator           0.1.1
rpy2                        3.5.11
scib                        1.1.3
scipy                       1.10.1
seaborn                     0.12.2
send2trash                  NA
session_info                1.0.0
six                         1.16.0
sklearn                     1.2.1
sniffio                     1.3.0
socks                       1.7.1
stack_data                  0.6.2
statsmodels                 0.13.5
texttable                   1.6.7
threadpoolctl               3.1.0
tornado                     6.2
tqdm                        4.64.1
traitlets                   5.9.0
typing_extensions           NA
tzlocal                     NA
umap                        0.5.3
upsetplot                   0.8.0
urllib3                     1.26.15
wcwidth                     0.2.6
websocket                   1.5.1
wrapt                       1.14.1
yaml                        6.0
zipp                        NA
zmq                         25.0.0
-----
IPython             8.9.0
jupyter_client      8.0.2
jupyter_core        5.2.0
jupyterlab          3.6.1
notebook            6.5.2
-----
Python 3.8.16 | packaged by conda-forge | (default, Feb  1 2023, 16:01:55) [GCC 11.3.0]
Linux-5.4.0-135-generic-x86_64-with-glibc2.10
-----
Session information updated at 2023-04-26 16:52
scanpy==1.9.1 anndata==0.8.0 umap==0.5.3 numpy==1.23.5 scipy==1.10.1 pandas==1.5.3 scikit-learn==1.2.1 statsmodels==0.13.5 python-igraph==0.10.4 louvain==0.8.0 pynndescent==0.5.8
@luxeredias
Copy link

I was using the harmony integration method in my pipeline and realized passing HVGs was not doing anything. Looking into the code, it's clear that the hvg argument is only used within the check_sanity function here and further in the check_hvg function here, which ultimately does not subset the input adata to keep only the hvgs:

def check_hvg(hvg, adata_var):
    if type(hvg) is not list:
        raise TypeError("HVG list is not a list")
    else:
        if not all(i in adata_var.index for i in hvg):
            raise ValueError("Not all HVGs are in the adata object")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants