Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sentence-transformers 2.2.2 pulling in nvidia packages #2637

Open
gyezheng opened this issue May 10, 2024 · 23 comments
Open

sentence-transformers 2.2.2 pulling in nvidia packages #2637

gyezheng opened this issue May 10, 2024 · 23 comments

Comments

@gyezheng
Copy link

I am using sentence-transformers-2.2.2.tar.gz while it pulls the following nvidia packages

nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl
nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl
nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl
nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl
nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl
nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl
nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl
nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl
nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl
nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl
nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl

When I search them online, it shows they are under license: NVIDIA Proprietary Software.
Can I freely use sentence-transformers-2.2.2.tar.gz?

Thanks!

@tomaarsen
Copy link
Collaborator

Hello!

Yes, these are requirements by the torch Python package that are needed for you to use CUDA, i.e. a GPU. You can freely use them.

Note that if you don't have a GPU, then you may want to install torch without CUDA support & then install sentence-transformers. You can use this widget and select "CPU" if that's the case. It'll save you some disk space.
But, if you have a GPU, be sure to install with the CUDA support like you've been doing.

  • Tom Aarsen

@gyezheng
Copy link
Author

Thank you for your reply!
We are the CPU only case.
I understand from technical perspectives, we can freely use those Nvidia packages. But any idea about from commercial perspective, can we ship them within our our own commercial product? Any difference between GPU and CPU cases from commercial perspective? Thanks!

@tomaarsen
Copy link
Collaborator

If you're using the CPU only, then you won't need those CUDA packages. You can install it with:

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install sentence-transformers

(assuming that you're on Linux).
And yes, torch and sentence-transformers have commercially permissive licenses, i.e. you can use these products within (paid) commercial products.

  • Tom Aarsen

@KyeMaloy97
Copy link

KyeMaloy97 commented May 13, 2024

So at the moment, I have been running two pip commands, the first was installing a load of dependencies in a requirements.txt and then the second was installing torch with the index url CPU parameter as you mentioned above.

pip install --no-deps -r requirements.txt
pip install --no-deps -r torch_requirements.txt

Maybe the order of installing sentence-transfromers in the first requirements.txt and then installing torch was pulling the 2.3.0 (with nvidia) version of torch along as well?

@KyeMaloy97
Copy link

If I do pip show torch I see:

Name: torch
Version: 1.13.1+cpu
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /usr/local/lib64/python3.9/site-packages
Requires: typing-extensions
Required-by: sentence-transformers, accelerate

So not sure why/how we are getting the nvidia packages in our scans?

@tomaarsen
Copy link
Collaborator

If I do pip show torch I see:

...

That is rather odd. Perhaps you can pip show cuda... with the CUDA packages to see what they are required by? Because torch with CPU should not require CUDA.

  • Tom Aarsen

@KyeMaloy97
Copy link

KyeMaloy97 commented May 14, 2024

If I run pip show nvidia_cublas... or pip show cuda I get no packages found... I'm not convinced we are downloading the files our scanner thinks were getting as I cannot locate them on disk at all, and in my site-packages folder I dont see anything about nvidia or any .whl files matching what our scanner is finding.

I also think if I was pulling them cuda files, the docker image would be a lot larger (its only 2.5GB ish total, think with CUDA files it would be 8GB+).

pip list gives me:

certifi               2024.2.2
charset-normalizer    3.3.2
click                 8.1.7
contourpy             1.2.1
cycler                0.12.1
eland                 8.12.1
elastic-transport     8.13.0
elasticsearch         8.13.0
filelock              3.14.0
fonttools             4.51.0
fsspec                2024.3.1
huggingface-hub       0.23.0
idna                  3.7
importlib_resources   6.4.0
joblib                1.4.2
kiwisolver            1.4.5
matplotlib            3.8.4
nltk                  3.8.1
numpy                 1.26.4
packaging             24.0
pandas                1.5.3
pillow                10.3.0
pip                   21.2.3
psutil                5.9.8
pyparsing             3.1.2
python-dateutil       2.9.0.post0
pytz                  2024.1
PyYAML                6.0.1
regex                 2024.4.28
requests              2.31.0
safetensors           0.4.3
scikit-learn          1.4.2
scipy                 1.13.0
sentence-transformers 2.2.2
setuptools            53.0.0
six                   1.16.0
tdqm                  0.0.1
threadpoolctl         3.5.0
tokenizers            0.14.1
torch                 1.13.1+cpu
torchvision           0.14.1+cpu
tqdm                  4.66.3
transformers          4.38.0
typing_extensions     4.9.0
urllib3               2.2.1
zipp                  3.18.1

@KyeMaloy97
Copy link

KyeMaloy97 commented May 14, 2024

For extra info I also installed pipdeptree and this was the output...

accelerate==0.29.3
├── huggingface-hub [required: Any, installed: 0.23.0]
│   ├── filelock [required: Any, installed: 3.14.0]
│   ├── fsspec [required: >=2023.5.0, installed: 2024.3.1]
│   ├── packaging [required: >=20.9, installed: 24.0]
│   ├── PyYAML [required: >=5.1, installed: 6.0.1]
│   ├── requests [required: Any, installed: 2.31.0]
│   │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│   │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│   │   ├── idna [required: >=2.5,<4, installed: 3.7]
│   │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│   ├── tqdm [required: >=4.42.1, installed: 4.66.3]
│   └── typing_extensions [required: >=3.7.4.3, installed: 4.9.0]
├── numpy [required: >=1.17, installed: 1.26.4]
├── packaging [required: >=20.0, installed: 24.0]
├── psutil [required: Any, installed: 5.9.8]
├── PyYAML [required: Any, installed: 6.0.1]
├── safetensors [required: >=0.3.1, installed: 0.4.3]
└── torch [required: >=1.10.0, installed: 1.13.1+cpu]
    └── typing_extensions [required: Any, installed: 4.9.0]
eland==8.12.1
├── elasticsearch [required: >=8.3,<9, installed: 8.13.0]
│   └── elastic-transport [required: >=8.13,<9, installed: 8.13.0]
│       ├── certifi [required: Any, installed: 2024.2.2]
│       └── urllib3 [required: >=1.26.2,<3, installed: 2.2.1]
├── matplotlib [required: >=3.6, installed: 3.8.4]
│   ├── contourpy [required: >=1.0.1, installed: 1.2.1]
│   │   └── numpy [required: >=1.20, installed: 1.26.4]
│   ├── cycler [required: >=0.10, installed: 0.12.1]
│   ├── fonttools [required: >=4.22.0, installed: 4.51.0]
│   ├── importlib_resources [required: >=3.2.0, installed: 6.4.0]
│   │   └── zipp [required: >=3.1.0, installed: 3.18.1]
│   ├── kiwisolver [required: >=1.3.1, installed: 1.4.5]
│   ├── numpy [required: >=1.21, installed: 1.26.4]
│   ├── packaging [required: >=20.0, installed: 24.0]
│   ├── pillow [required: >=8, installed: 10.3.0]
│   ├── pyparsing [required: >=2.3.1, installed: 3.1.2]
│   └── python-dateutil [required: >=2.7, installed: 2.9.0.post0]
│       └── six [required: >=1.5, installed: 1.16.0]
├── numpy [required: >=1.2.0,<2, installed: 1.26.4]
├── packaging [required: Any, installed: 24.0]
└── pandas [required: >=1.5,<2, installed: 1.5.3]
    ├── numpy [required: >=1.20.3, installed: 1.26.4]
    ├── python-dateutil [required: >=2.8.1, installed: 2.9.0.post0]
    │   └── six [required: >=1.5, installed: 1.16.0]
    └── pytz [required: >=2020.1, installed: 2024.1]
pipdeptree==2.20.0
├── packaging [required: >=23.1, installed: 24.0]
└── pip [required: >=23.1.2, installed: 24.0]
sentence-transformers==2.2.2
├── huggingface-hub [required: >=0.4.0, installed: 0.23.0]
│   ├── filelock [required: Any, installed: 3.14.0]
│   ├── fsspec [required: >=2023.5.0, installed: 2024.3.1]
│   ├── packaging [required: >=20.9, installed: 24.0]
│   ├── PyYAML [required: >=5.1, installed: 6.0.1]
│   ├── requests [required: Any, installed: 2.31.0]
│   │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│   │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│   │   ├── idna [required: >=2.5,<4, installed: 3.7]
│   │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│   ├── tqdm [required: >=4.42.1, installed: 4.66.3]
│   └── typing_extensions [required: >=3.7.4.3, installed: 4.9.0]
├── nltk [required: Any, installed: 3.8.1]
│   ├── click [required: Any, installed: 8.1.7]
│   ├── joblib [required: Any, installed: 1.4.2]
│   ├── regex [required: >=2021.8.3, installed: 2024.4.28]
│   └── tqdm [required: Any, installed: 4.66.3]
├── numpy [required: Any, installed: 1.26.4]
├── scikit-learn [required: Any, installed: 1.4.2]
│   ├── joblib [required: >=1.2.0, installed: 1.4.2]
│   ├── numpy [required: >=1.19.5, installed: 1.26.4]
│   ├── scipy [required: >=1.6.0, installed: 1.13.0]
│   │   └── numpy [required: >=1.22.4,<2.3, installed: 1.26.4]
│   └── threadpoolctl [required: >=2.0.0, installed: 3.5.0]
├── scipy [required: Any, installed: 1.13.0]
│   └── numpy [required: >=1.22.4,<2.3, installed: 1.26.4]
├── sentencepiece [required: Any, installed: ?]
├── torch [required: >=1.6.0, installed: 1.13.1+cpu]
│   └── typing_extensions [required: Any, installed: 4.9.0]
├── torchvision [required: Any, installed: 0.14.1+cpu]
│   ├── numpy [required: Any, installed: 1.26.4]
│   ├── pillow [required: >=5.3.0,!=8.3.*, installed: 10.3.0]
│   ├── requests [required: Any, installed: 2.31.0]
│   │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│   │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│   │   ├── idna [required: >=2.5,<4, installed: 3.7]
│   │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│   ├── torch [required: ==1.13.1, installed: 1.13.1+cpu]
│   │   └── typing_extensions [required: Any, installed: 4.9.0]
│   └── typing_extensions [required: Any, installed: 4.9.0]
├── tqdm [required: Any, installed: 4.66.3]
└── transformers [required: >=4.6.0,<5.0.0, installed: 4.38.0]
    ├── filelock [required: Any, installed: 3.14.0]
    ├── huggingface-hub [required: >=0.19.3,<1.0, installed: 0.23.0]
    │   ├── filelock [required: Any, installed: 3.14.0]
    │   ├── fsspec [required: >=2023.5.0, installed: 2024.3.1]
    │   ├── packaging [required: >=20.9, installed: 24.0]
    │   ├── PyYAML [required: >=5.1, installed: 6.0.1]
    │   ├── requests [required: Any, installed: 2.31.0]
    │   │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
    │   │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
    │   │   ├── idna [required: >=2.5,<4, installed: 3.7]
    │   │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
    │   ├── tqdm [required: >=4.42.1, installed: 4.66.3]
    │   └── typing_extensions [required: >=3.7.4.3, installed: 4.9.0]
    ├── numpy [required: >=1.17, installed: 1.26.4]
    ├── packaging [required: >=20.0, installed: 24.0]
    ├── PyYAML [required: >=5.1, installed: 6.0.1]
    ├── regex [required: !=2019.12.17, installed: 2024.4.28]
    ├── requests [required: Any, installed: 2.31.0]
    │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
    │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
    │   ├── idna [required: >=2.5,<4, installed: 3.7]
    │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
    ├── safetensors [required: >=0.4.1, installed: 0.4.3]
    ├── tokenizers [required: >=0.14,<0.19, installed: 0.14.1]
    │   └── huggingface-hub [required: >=0.16.4,<0.18, installed: 0.23.0]
    │       ├── filelock [required: Any, installed: 3.14.0]
    │       ├── fsspec [required: >=2023.5.0, installed: 2024.3.1]
    │       ├── packaging [required: >=20.9, installed: 24.0]
    │       ├── PyYAML [required: >=5.1, installed: 6.0.1]
    │       ├── requests [required: Any, installed: 2.31.0]
    │       │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
    │       │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
    │       │   ├── idna [required: >=2.5,<4, installed: 3.7]
    │       │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
    │       ├── tqdm [required: >=4.42.1, installed: 4.66.3]
    │       └── typing_extensions [required: >=3.7.4.3, installed: 4.9.0]
    └── tqdm [required: >=4.27, installed: 4.66.3]
setuptools==53.0.0
tdqm==0.0.1
└── tqdm [required: Any, installed: 4.66.3]

@tomaarsen
Copy link
Collaborator

I think that looks fine, then! In fact, if you increase from sentence_transformers==2.2.2 to a more recent version, then you'll actually lose the NLTK and sentencepiece dependencies. Although they're not particularly big, so I wouldn't worry about it too much.

  • Tom Aarsen

@KyeMaloy97
Copy link

KyeMaloy97 commented May 14, 2024

Do you happen to know if theres a check I can make to just completely know if them nvidia***.whl files got installed? I had a look in /usr/bin and /usr/lib/python3.9/site-packages and didn't find anything, also running find / -iname *.whl and find / -iname "*nvidia*" returns nothing

@tomaarsen
Copy link
Collaborator

Searching for cud might also help, but other than that I'm not sure

@KyeMaloy97
Copy link

KyeMaloy97 commented May 14, 2024

I had a look and it found a load of related files, from torch, torchgen, and transformers... most of the files are like:
/usr/local/lib64/python3.9/site-packages/torch/include/ATen/cuda/CUDATensorMethods.cuh and associated header files or like /usr/local/lib/python3.9/site-packages/transformers/kernels/mra/cuda_kernel.cu

I think these are just source code files from these packages tho, not the Nvidia Propriety Software

@champaanand
Copy link

I'm also facing the same issue, where nvidia* packages are not getting downloaded, not being used also in our product.(our application runs on windows, where the inventory report shows wheel packages).
Please let us know if there is any update.

@KyeMaloy97
Copy link

KyeMaloy97 commented Jun 18, 2024

Are you using an OSS scanning tool such as Mend? Our issue was around Mend under the covers doing a pip download and ignoring the fact we were doing --no-deps when installing the package, so the full pip download was getting dependencies we were not getting.

@champaanand
Copy link

champaanand commented Jun 19, 2024

Yes, we are using mend, integrated with github repo and Mend inventory shows these nvidia* packages.
and our Open source approval team says not to use nvidia* even though we are not
using it in our product.
Kindly let me know how to proceed further.

@tomaarsen
Copy link
Collaborator

I'm a bit confused

where nvidia* packages are not getting downloaded, not being used also in our product.

Mend inventory shows these nvidia* packages. [...] even though we are using it in our product.

So the packages are not being downloaded, but would you like to download them or not?

In short, to use Sentence Transformers, you will have to use torch. You can install torch with GPU/CUDA support, or without it. To get GPU support, you will have to install torch with CUDA support, which means that you'll require NVIDIA CUDA-specific packages, e.g.:

pip install torch --index-url https://download.pytorch.org/whl/cu121

If you only want to run Sentence Transformers on CPU, then you don't need to install torch with CUDA, e.g.:

pip install torch --index-url https://download.pytorch.org/whl/cpu

The latter should not install NVIDIA's CUDA packages, I believe.

  • Tom Aarsen

@champaanand
Copy link

We do not use nvidia* packages. Our application is doesn't need this. Problem with Mend inventory as it shows nvidia* packages.
Open source team says don't use nvidia*.

@KyeMaloy97
Copy link

If you are using it like our use case, we were installing using the CPU version which doesn't get the GPU related stuff, but mend looks at the packages installed and seems to just ignore the option (CPU specific, no dependencies etc) and just downloads everything and then sees that "ah, Sentence Transformers requires Nvidia packages" which would be right if we weren't using the CPU specific variant.

It's an issue on Mend.io than on this library though. It's how they do they're checking that causes the Nvidia packages to be detected when they aren't actually present. We are using them in a Docker image and you can tell we don't get them as we looked through the system and cant find them and the image is small, if we were pulling them the image would be 100s MBs larger than it is.

@lyonsy
Copy link

lyonsy commented Oct 23, 2024

I ran into this issue and what worked for me was to install the torch cpu version outside of and before installing the requirements file. Example with Docker:

ENV UV_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cpu
RUN uv pip install --index-strategy unsafe-best-match --prerelease=allow torch==2.4.0+cpu

Then I would install sentence-transformers from my requirements file.

@AleefBilal
Copy link

Below are the versions that are working fine for me

pip install torch==2.4.1 --index-url https://download.pytorch.org/whl/cpu
pip install sentence-transformers==3.0.1

@AdityaSoni19031997
Copy link

AdityaSoni19031997 commented Dec 16, 2024

What would be the apt way to control the deps when one uses poetry? There we cannot quite say download torch first from CPU and then download "sentence-transformers".

Wouldn't it be cleaner if we've something like "sentence-transformers+cpu"?

I'm on Darwin M1 Chip and despite using torch+cpu on poetry side, sentence transformers is causing it to bring everything. I don't think it's a poetry only issue as when the same lock file is executed on a Jenkins (linux / ubuntu env), it doesn't bring CUDA but when done on local / M1 it does.

PS: I'm trying to double verify things on my side

Thanks

@AleefBilal
Copy link

@AdityaSoni19031997
Don't quite know about Mac & Poetry but on linux and pip, you first have to download cpu torch and then sentence_transformer, as it downloads itself according to the torch installed.
If no torch, it'll download default torch, containing cuda.

@tomaarsen
Copy link
Collaborator

@AleefBilal is exactly right. At least on Linux devices, installing torch defaults to CUDA. On Windows, it'll default to CPU.
My recommendation is to install torch for CPU first, and then Sentence Transformers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants