Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency issue while finetuning: lingvo 0.12.7 requires tensorflow~=2.9.2 #156

Open
jmanous opened this issue Oct 9, 2024 · 3 comments

Comments

@jmanous
Copy link

jmanous commented Oct 9, 2024

Hi all,

I installed timesfm through poetry on Ubuntu 20.04
I have the following environment: Nvidia H100 GPUs available, nvidia 550 driver, cuda 12.4 installed.
Python is 3.10.12

After poetry install --only pax and poetry shell

a) timesfm module still can't be found in the python path (no idea why, need to followup with pip install timesfm)

b) after timesfm package is installed, I realized that libcudart.so.11.0 was missing, which is normal since I have cuda 12.4. libcudart.so.11.0 is being asked because lingvo 0.12.7 requires TF 2.9.2 (which has direct dependency to cuda 11!)

Then I realized, isn't that a direct conflict? Timesfm has a dependency you need jax 0.4.26 which requires cuda 12.1!
jax = { version = ">=0.4.26", extras = ["cuda12"], python = ">=3.10,<3.11" }.

How is that supposed to work? Currently as it stands, and following the installation guide, I need both cuda 12 and cuda 11?

Any help would be greatly appreciated.

@rajatsen91
Copy link
Collaborator

Hi, It seems to work on my environment. I have python 3.10.15 and cuda 12.4.

pip freeze shows:

tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.9.3
tensorflow-datasets==4.8.3
tensorflow-estimator==2.9.0
tensorflow-hub==0.16.1
tensorflow-io-gcs-filesystem==0.37.1
tensorflow-metadata==1.12.0
tensorflow-text==2.9.0
tensorstore==0.1.55

and

jax==0.4.26
jax-bitempered-loss==0.0.2
jax-cuda12-pjrt==0.4.26
jax-cuda12-plugin==0.4.26
jaxlib==0.4.26
lingvo==0.12.7

@jmanous
Copy link
Author

jmanous commented Oct 10, 2024

Thanks for your reply, here is my env:

tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.9.3
tensorflow-datasets==4.8.3
tensorflow-estimator==2.9.0
tensorflow-hub==0.16.1
tensorflow-io-gcs-filesystem==0.37.1
tensorflow-metadata==1.12.0
tensorflow-text==2.9.0
tensorstore==0.1.55

jax==0.4.26
jax-bitempered-loss==0.0.2
jax-cuda12-pjrt==0.4.26
jax-cuda12-plugin==0.4.26
jaxlib==0.4.26
jaxtyping==0.2.28
lingvo==0.12.7

As far as I see we have the same exact env (as it should), but still the jax version has issues (see below)

According to the official documentation https://www.tensorflow.org/install/source#gpu
As a result, trying to run tests/test_timesfm.py results to:

(timesfm-py3.10) test@zms3a:~/tsforecasting/timesfm/tests$ python test_timesfm.py
TimesFM v1.2.0. See https://github.com/google-research/timesfm/blob/master/README.md for updated APIs.
Loaded Jax TimesFM.
2024-10-10 20:33:13.559005: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

Could you please help? Perhaps you have a leftover libcudart.so.11.0 in your system that just works?

@rajatsen91
Copy link
Collaborator

Hi @jmanous,

Did you try forecasting after that message. I also get that message but everything works beyond that.

Screenshot 2024-10-11 at 9 58 23 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants