Skip to content
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.

Problem in building Alpa-modified Jaxlib. #956

Open
Fonsifa opened this issue Sep 23, 2023 · 5 comments
Open

Problem in building Alpa-modified Jaxlib. #956

Fonsifa opened this issue Sep 23, 2023 · 5 comments

Comments

@Fonsifa
Copy link

Fonsifa commented Sep 23, 2023

Please describe the bug

Please describe the expected behavior

System information and environment

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04, docker):
  • Python version: 3.9
  • CUDA version:11.3
  • NCCL version: 8.2.0.53
  • cupy version: cupy-cuda11x 12.2.0
  • GPU : GeForce RTX3090
  • Alpa version: 0.2.3
  • JAX version: 0.3.22

To Reproduce
Steps to reproduce the behavior:
When I try to install alpa from source, and execute
python3 build/build.py --enable_cuda --dev_install --bazel_options=--override_repository=org_tensorflow=$(pwd)/../third_party/tensorflow-alpa, some warnings happened.
And I don't know if it's related to the error happened in the second pic.

Screenshots
If applicable, add screenshots to help explain your problem.
image
troubleshoot

Code snippet to reproduce the problem

Additional information
Add any other context about the problem here or include any logs that would be helpful to diagnose the problem.

@Fonsifa Fonsifa changed the title Problem in buuilding Alpa-modified Jaxlib. Problem in building Alpa-modified Jaxlib. Sep 23, 2023
@Lssyes
Copy link

Lssyes commented Oct 14, 2023

this bug caused by wrong version of libnccl
i solved it by reinstalling a right ver libnccl and recreating a new python env based on this libnccl

@Fonsifa
Copy link
Author

Fonsifa commented Oct 19, 2023

this bug caused by wrong version of libnccl i solved it by reinstalling a right ver libnccl and recreating a new python env based on this libnccl

may i ask your concrete version of python and libnccl, thx

@Lssyes
Copy link

Lssyes commented Oct 19, 2023

yeah
python == 3.8.13
gcc == 7.5.0
nccl == libnccl.so.2.8.4

@ertza
Copy link

ertza commented Nov 22, 2023

Hi, I am running into the same issue when building from source. I don't understand how libnccl version affects the filenotfound error? Any other solution to this?

@Fonsifa
Copy link
Author

Fonsifa commented Nov 22, 2023

Hi, I am running into the same issue when building from source. I don't understand how libnccl version affects the filenotfound error? Any other solution to this?

the mirror url is write in some workplace file. it seems the file not found problem not the error reason. the incorrect libnccl version is the main cause.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants