Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flight / skbio error. numpy.ndarray size changed, may indicate binary incompatibility #43

Closed
julianzaugg opened this issue May 9, 2022 · 16 comments

Comments

@julianzaugg
Copy link
Contributor

julianzaugg commented May 9, 2022

The following error is thrown when running Rosella/Flight on a fresh install. I suspect it is related to this issue in HDBSCAN + numpy.

Also see https://stackoverflow.com/questions/66666380/issue-with-hdbscan-valueerror-numpy-ndarray-size-changed-may-indicate-binary for a possible solution.

EDIT: also see https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp

2022-05-01T07:30:14Z ERROR bird_tool_utils::command] Error when running flight process. Exitstatus was : ExitStatus(unix_wait_status(256))
11:01
The STDERR was: “05/01/2022 05:30:09 PM INFO: Time - 17:30:09 01-05-2022\nTraceback (most recent call last):\n File \“/srv/home/user/.conda/envs/5456b973
7e12158ee6834b6c943f944c/bin/flight\“, line 10, in <module>\n  sys.exit(main())\n File \“/srv/home/user/.conda/envs/5456b9737e12158ee6834b6c943f944c/lib/python3.9/site-packages/flight/flight.py\“, line 449
, in main\n  args.func(args)\n File \“/srv/home/user/.conda/envs/5456b9737e12158ee6834b6c943f944c/lib/python3.9/site-packages/flight/flight.py\“, line 569, in refine\n  rosella = rosella_engine_construct
or(args)\n File \“/srv/home/user/.conda/envs/5456b9737e12158ee6834b6c943f944c/lib/python3.9/site-packages/flight/flight.py\“, line 534, in rosella_engine_constructor\n  from flight.rosella.rosella import R
osella\n File \“/srv/home/user/.conda/envs/5456b9737e12158ee6834b6c943f944c/lib/python3.9/site-packages/flight/rosella/rosella.py\“, line 48, in <module>\n  from flight.rosella.validating import Validator\
n File \“/srv/home/user/.conda/envs/5456b9737e12158ee6834b6c943f944c/lib/python3.9/site-packages/flight/rosella/validating.py\“, line 48, in <module>\n  from flight.rosella.clustering import Clusterer, ite
rative_clustering_static, kmeans_cluster\n File \“/srv/home/user/.conda/envs/5456b9737e12158ee6834b6c943f944c/lib/python3.9/site-packages/flight/rosella/clustering.py\“, line 56, in <module>\n  from flight
.rosella.binning import Binner\n File \“/srv/home/user/.conda/envs/5456b9737e12158ee6834b6c943f944c/lib/python3.9/site-packages/flight/rosella/binning.py\“, line 43, in <module>\n  import skbio.stats.compo
sition\n File \“/srv/home/user/.conda/envs/5456b9737e12158ee6834b6c943f944c/lib/python3.9/site-packages/skbio/__init__.py\“, line 11, in <module>\n  import skbio.io # noqa\n File \“/srv/home/user/.co
nda/envs/5456b9737e12158ee6834b6c943f944c/lib/python3.9/site-packages/skbio/io/__init__.py\“, line 243, in <module>\n  import_module(‘skbio.io.format.clustal’)\n File \“/srv/home/user/.conda/envs/5456b9737
e12158ee6834b6c943f944c/lib/python3.9/importlib/__init__.py\“, line 127, in import_module\n  return _bootstrap._gcd_import(name[level:], package, level)\n File \“/srv/home/user/.conda/envs/5456b9737e12158e
e6834b6c943f944c/lib/python3.9/site-packages/skbio/io/format/clustal.py\“, line 148, in <module>\n  from skbio.alignment import TabularMSA\n File \“/srv/home/user/.conda/envs/5456b9737e12158ee6834b6c943f944c/lib/python3.9/site-packages/skbio/alignment/__init__.py\“, line 204, in <module>\n  from ._pairwise import (\n File \“/srv/home/user/.conda/envs/5456b9737e12158ee6834b6c943f944c/lib/python3.9/site-packages/skbio/alignment/_pairwise.py\“, line 15, in <module>\n  from skbio.alignment._ssw_wrapper import StripedSmithWaterman\n File \“skbio/alignment/_ssw_wrapper.pyx\“, line 1, in init skbio.alignment._ssw_wrapper\nValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject\n”
thread ‘main’ panicked at ‘Failed to grab stdout from failed flight process’, /home/conda/.cargo/registry/src/github.com-1ecc6299db9ec823/bird_tool_utils-0.3.0/src/command.rs:27:14
11:02
Error in rule checkm_rosella:
  jobid: 17
  output: data/rosella_bins/checkm.out
  conda-env: /srv/home/user/.conda/envs/b74c952d3cb03d84d232c6fd11bc410d
  shell:
    checkm lineage_wf -t 30 --pplacer_threads 30 -x fna data/rosella_bins/ data/rosella_bins//checkm --tab_table -f data/rosella_bins/checkm.out
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
An error occurred
Complete log: .snakemake/log/2022-05-01T150003.760975.snakemake.log
05/01/2022 05:30:22 PM CRITICAL: Command ‘snakemake --snakefile /srv/home/user/temp/aviary/aviary/modules/Snakefile --directory /srv/projects/microbial_inducers/analysis/binning/20220428_aviary_recover/psin_15 --jobs 30 --rerun-incomplete --configfile ‘/srv/projects/microbial_inducers/analysis/binning/20220428_aviary_recover/psin_15/config.yaml’ --nolock --conda-frontend mamba --use-conda --conda-prefix /srv/home/user/.conda/envs/ recover_mags’ returned non-zero exit status 1.
@rhysnewell
Copy link
Owner

I've tested this on a fresh install but have not been able to replicate it so this really might be a system specific problem on top of weird numpy compilation issues. I'm keeping an eye on it, but the best bet seems to be try and activate the conda environment b74c952d3cb03d84d232c6fd11bc410d and reinstall numpy?

@julianzaugg
Copy link
Contributor Author

Yep, I will try re-installing numpy.

@julianzaugg
Copy link
Contributor Author

Looks like updating numpy with pip install --upgrade numpy does not solve the issue. Will keep exploring options.

@rhysnewell
Copy link
Owner

rhysnewell commented May 10, 2022

Can you post the specs of the server you are trying to run on?
And could you also post the version info for numpy?

@julianzaugg
Copy link
Contributor Author

julianzaugg commented May 10, 2022

It is actually a student I am helping who is having the issue (I have not tested myself).

Server specs:
Centos 7, 40 CPU threads, 512GB RAM

Numpy was upgraded from version 1.21.6 -> 1.22.3.

EDIT: There was this warning/error from pip

Collecting numpy
 Using cached numpy-1.22.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB)
Installing collected packages: numpy
 Attempting uninstall: numpy
  Found existing installation: numpy 1.21.6
  Uninstalling numpy-1.21.6:
   Successfully uninstalled numpy-1.21.6
ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.55.0 requires numpy<1.22,>=1.18, but you have numpy 1.22.3 which is incompatible.
Successfully installed numpy-1.22.3

@rhysnewell
Copy link
Owner

Right, so my understanding of this issue is that numpy 1.20 brought about some ABI breaking changes which resulted in the pip version of hdbscan to become incompatible with newer version of numpy.

This could potentially be solved by installing hdbscan from source via pip e.g.:
pip install --upgrade git+https://github.com/scikit-learn-contrib/hdbscan.git#egg=hdbscan

Not sure if you need the --upgrade but hopefully that forces it to overwrite the previous install

@julianzaugg
Copy link
Contributor Author

Thanks. Will give this a try.

@rhysnewell
Copy link
Owner

rhysnewell commented May 11, 2022

I've been able to reproduce this, just working on a fix now. Weirdly it was skbio that threw the error for me and not hdbscan but the same underlying issue is at hand I believe

@julianzaugg
Copy link
Contributor Author

Awesome. Look forward to testing out the fix.

@rhysnewell
Copy link
Owner

This change to the rosella.yaml seemed to fix the issue on my end. Maybe try it out and see if it works for you: https://github.com/rhysnewell/aviary/blob/dev/aviary/modules/binning/envs/rosella.yaml

@julianzaugg
Copy link
Contributor Author

Thanks, will try it out and let you know how it goes.

@rhysnewell
Copy link
Owner

This might not happen to you, but new checkm-genome installs I've created using conda have not downloaded the checkm database resulting in aviary to error out. So if you create this new rosella environment and it throws and error at a checkm step, just be aware of that

@rhysnewell
Copy link
Owner

Wait, just realised this recipe won't work as intended. The flight install gets overwritten due to the ordering, let me just test a different recipe

@julianzaugg
Copy link
Contributor Author

Already testing....seems to be working/has not crashed yet.

@rhysnewell
Copy link
Owner

It won't crash, it will just use the wrong version of flight i.e. the one that crashes when you use a different number of threads

@rhysnewell
Copy link
Owner

New commit should have a correctly ordered yaml file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants