Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.1 as it may crash #927

Open
2424004764 opened this issue Nov 21, 2024 · 12 comments

Comments

@2424004764
Copy link

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

How to solve this error?

@virtualarchitectures
Copy link

You should be able to work around the issue by downgrading Numpy in your python environment by entering "pip install "numpy<2.0" in the command line terminal. Just make sure you are in the correct environment.

@jonathanfox5
Copy link

There are some pull requests that fix this issue but they haven't been merged with the main project.

Based upon them I ended up creating this package to use a dependency in my projects. Not the most elegant way of doing things but it helped me escape python dependency hell!

https://github.com/jonathanfox5/whisperx-numpy2-compatibility

It has the same name on pypi so you can run:

pip install whisperx-numpy2-compatibility

@virtualarchitectures
Copy link

virtualarchitectures commented Nov 22, 2024

You should be able to work around the issue by downgrading Numpy in your python environment by entering "pip install "numpy<2.0" in the command line terminal. Just make sure you are in the correct environment.

After running my quick fix I have indeed found that I am still in "dependency hell". @jonathanfox5 This looks really promising so thanks for sharing. I'll try giving the package a test when I get chance but I'm unclear yet how easily it will integrate with my own pipeline. Can I just substitute import whisperx-numpy2-compatibility instead of import whisperx or should I expect there being more to it?

@jonathanfox5
Copy link

jonathanfox5 commented Nov 22, 2024

Yup, ‘import whisperx-numpy2-compatibility as whisperx’ should do the job.

I haven’t (yet) tried working with it directly embedded in a script as I have just been calling it using subprocess (the reason why I needed it to be compatible with numpy2 was so that I could include my whole application in a single python package)

All that to say, for any use cases outside of the CLI interface, it’s currently untested!

The only known issue is that faster whisper issued an update yesterday which breaks compatibility with whisperx. Therefore, you will need to specify version 1.0.3 of faster-whisper in your requirements.txt or pyproject.toml.

@jonathanfox5
Copy link

The other potential solution is to try to find dependencies that don’t rely on numpy 2.

E.g. I was using spacy to process some of the data. Spacy 3.8 needs numpy 2 but Spacy 3.7 needs numpy 1.

@virtualarchitectures
Copy link

@jonathanfox5 Is your build configured to run on CPU only?

@jonathanfox5
Copy link

jonathanfox5 commented Nov 22, 2024

Either CPU or GPU, it depends on which version of torch you have.

It has been tested on:

  • macOS (CPU) [M2 Pro]
  • Windows 11 (CPU) [Ryzen 5600X]
  • Windows 11 (GPU) [RTX 3070 Ti]
  • Ubuntu Server 24 LTS ARM (CPU) [Ampere]

@jonathanfox5
Copy link

Just to confirm, I have now tested some scripts using import whisperx-numpy2-compatibility as whisperx and it works as intended.

I've published version 0.1.1 on pypi with the only change being that it forces faster-whisper 1.0.3 to be used instead of yesterday's release.

@virtualarchitectures
Copy link

@jonathanfox5 Thanks for this. I typically use Conda but I managed to work out Pyenv and Poetry. The project builds fine but when I execute my code I get the an error AssertionError: Torch not compiled with CUDA enabled. I think it may be that I'd previously installed my dependencies using Conda:

conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia

I thought I might be missing the last arguments pytorch-cuda=11.7 -c pytorch -c nvidia so I tried the following in poetry: poetry run conda install pytorch-cuda=11.7 -c pytorch -c nvidia. This didn't resolve it.

Looking at the stack trace it suggests I have a 'broken build':

File d:\Github\Python\DS-Audio-Transcription\.venv\lib\site-packages\torch\cuda\__init__.py:305, in _lazy_init()
    [300](file:///D:/Github/Python/DS-Audio-Transcription/.venv/lib/site-packages/torch/cuda/__init__.py:300)     raise RuntimeError(
    [301](file:///D:/Github/Python/DS-Audio-Transcription/.venv/lib/site-packages/torch/cuda/__init__.py:301)         "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
    [302](file:///D:/Github/Python/DS-Audio-Transcription/.venv/lib/site-packages/torch/cuda/__init__.py:302)         "multiprocessing, you must use the 'spawn' start method"
    [303](file:///D:/Github/Python/DS-Audio-Transcription/.venv/lib/site-packages/torch/cuda/__init__.py:303)     )
    [304](file:///D:/Github/Python/DS-Audio-Transcription/.venv/lib/site-packages/torch/cuda/__init__.py:304) if not hasattr(torch._C, "_cuda_getDeviceCount"):
--> [305](file:///D:/Github/Python/DS-Audio-Transcription/.venv/lib/site-packages/torch/cuda/__init__.py:305)     raise AssertionError("Torch not compiled with CUDA enabled")
    [306](file:///D:/Github/Python/DS-Audio-Transcription/.venv/lib/site-packages/torch/cuda/__init__.py:306) if _cudart is None:
    [307](file:///D:/Github/Python/DS-Audio-Transcription/.venv/lib/site-packages/torch/cuda/__init__.py:307)     raise AssertionError(
    [308](file:///D:/Github/Python/DS-Audio-Transcription/.venv/lib/site-packages/torch/cuda/__init__.py:308)         "libcudart functions unavailable. It looks like you have a broken build?"
    [309](file:///D:/Github/Python/DS-Audio-Transcription/.venv/lib/site-packages/torch/cuda/__init__.py:309)     )

AssertionError: Torch not compiled with CUDA enabled

If you have any ideas they would be welcome. No problems if not.

If it's relevant I've slightly adapted my pyproject.toml as follows:

[tool.poetry]
name = "ds-audio-transcription"
version = "0.1.0"
description = "A repository of Jupyter notebooks for audio transcription with Python."
authors = ["Virtual Architectures"]
license = "MIT"
readme = "README.md"

[tool.poetry.dependencies]
python = ">=3.10,<3.13"
torch = ">=2"
torchaudio = ">=2"
faster-whisper = "^1.0.3"
transformers = "*"
pandas = "*"
nltk = "*"
pyannote-audio = "^3.3.2"
llvmlite = "^0.43.0"
torchmetrics = "^1.5.2"
numba = "^0.60.0"
whisperx-numpy2-compatibility = "^0.1.1"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

@jonathanfox5
Copy link

jonathanfox5 commented Nov 22, 2024

I'm not familiar with using conda since I will either just use plain pip or uv to install tools. Looking at your pyproject.toml, it looks like you aren't installing the CUDA version of torch, just the CPU version. Knowing nothing about conda, I suspect that what is happening is the conda install is happening in a different virtual environment from your main tool (take that with a pinch of salt, I could be very wrong!)

Maybe you can run the following to check that there isn't an issue with my build on your system? This lines are based upon the code in the windows installer for my tool that is designed to work on any system, even if python isn't installed.

uv tool install whisperx-numpy2-compatibility --python 3.12
uv tool install whisperx-numpy2-compatibility --python 3.12 --with torch==2.5.1+cu124 --index https://download.pytorch.org/whl/cu124
uv tool install whisperx-numpy2-compatibility --python 3.12 --with torchaudio==2.5.1+cu124 --index https://download.pytorch.org/whl/cu124
uv tool update-shell

(it's possible that you might need quotes around some of those parameters to get it working in your shell)

You can then run whisperx from the command line to check that it executes correctly (takes a while to load up first time you launch it) Obviously this won't integrate it with your project but it will at least eliminate some variables for you as you troubleshoot it!

uv can be installed using the following command if you don't already have it:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

You may also need Nvidia's CUDA toolkit installed but the required runtimes should be covered by your drivers: https://developer.nvidia.com/cuda-toolkit

Hope this helps! Fingers crossed that it's the Nvidia toolkit that's missing, that's the easiest one to fix!

@virtualarchitectures
Copy link

Hi @jonathanfox5 thanks very much for the help. For @2424004764 I got up and running with the following steps:

  1. I installed UV for package management which I'd recommend because it's simple to use and will save you messing about with Pipx and Pyenv.
  2. I found UV had some useful documentation on integrating UV with PyTorch: https://docs.astral.sh/uv/guides/integration/pytorch/
  3. I added the following pyproject.toml to the root of my project:
[project]
name = "YOUR PROJECT NAME"
version = "0.1.0"
description = "PROJECT DESCRIPTION"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
  "torch==2.5.1",
  "torchaudio==2.5.1",
  "whisperx_numpy2_compatibility",
]

[tool.uv.sources]
torch = [
    { index = "pytorch-cu124" },
]
torchvision = [
    { index = "pytorch-cu124" },
]

[[tool.uv.index]]
name = "pytorch-cu124"
url = "https://download.pytorch.org/whl/cu124"
explicit = true
  1. From a terminal session in the root of the project you can then just run uv sync to download all of the project dependencies and initiate a virtual environment or uv run <name of script>.py which will also sync your environment to the toml file but will also run your program. I hope that helps.

@jonathanfox5 one thing I noted is I'm now getting new spam in the console about INFO:speechbrain.utils.fetching which I wasn't getting before. The transcription seemed to run fine but I didn't expect to see this based on prior experience.

@jonathanfox5
Copy link

Glad you got it running! I also get speechbrain info statements but mine are slightly different. I've not noticed any issues with transcribing accuracy or performance.

Info messages reproduced below for reference:

INFO:speechbrain.utils.quirks:Applied quirks (see `speechbrain.utils.quirks`): [disable_jit_profiling, allow_tf32]
INFO:speechbrain.utils.quirks:Excluded quirks specified by the `SB_DISABLE_QUIRKS` environment (comma-separated list): []
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.4.0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.local/.cache/torch/whisperx-vad-segmentation.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.3.2. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.5.1. Bad things might happen unless you revert torch to 1.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants