Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiprocessRGB(D)Publisher cannot instantiate Zed2i camera #125

Open
Victorlouisdg opened this issue Feb 9, 2024 · 8 comments
Open

MultiprocessRGB(D)Publisher cannot instantiate Zed2i camera #125

Victorlouisdg opened this issue Feb 9, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@Victorlouisdg
Copy link
Contributor

Describe the bug
The publisher process hangs when initializing a Zed2i camera.

To Reproduce
Run one of the multiprocessing scripts.

Expected behavior
Yesterday multiprocessing still worked on my setup, I even recorded some short videos with the MultiprocessVideoRecorder.

Environment:
Gorilla desktop. Tested with two Zed2i camera. Occasionally got the error code NO GPU DETECTED. Rebooted desktop multiple times.

@Victorlouisdg Victorlouisdg added the bug Something isn't working label Feb 9, 2024
@Victorlouisdg Victorlouisdg changed the title MultiprocessRGB(D)Publisher cannot insatiated Zed2i camera MultiprocessRGB(D)Publisher cannot instantiate Zed2i camera Feb 9, 2024
@Victorlouisdg
Copy link
Contributor Author

Creating a Zed2i camera in a regular Python script / parent process still works fine. So this has something to do with the fact that a subprocess started from a Python script behaves differently than its parent process.

@m-decoster
Copy link
Contributor

May be related to #116

@Victorlouisdg
Copy link
Contributor Author

Victorlouisdg commented Feb 12, 2024

The probleem seem to be GPU availablity in the Publisher process. Trying to close the camera results in:

CUDA error at Camera.cpp:163 code=304(cudaErrorOperatingSystem) "void sl::Camera::close()" 

I also get:

IndexError: Could not open Zed2i camera, error = NO GPU DETECTED

PyTorch also does not find any GPUs in the created process torch.cuda.is_available() returns False. And prints this warning:

/home/victor/anaconda3/envs/cloth-competition/lib/python3.10/site-packages/torch/cuda/__init__.py:138: 
UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS (Triggered internally at /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1699535260532/work/c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0

@Victorlouisdg
Copy link
Contributor Author

Victorlouisdg commented Feb 12, 2024

Adding multiprocessing.set_start_method("spawn") fixes the CUDA issue and allows opening ZED cameras. However, the ZED SDK always starts optimizing the neural depth mode with this start method. I don't really understand why, maybe because it doesn't find /usr/local/zed/resources due to a missing environment variable?

@Victorlouisdg
Copy link
Contributor Author

Nevermind, I was confused about the neural_depth optimization. It started because I changed my CUDA version to 12.3, and is not related to multiprocessing.

@Victorlouisdg
Copy link
Contributor Author

This fixes the issue, but requires you to do this in every script with a MultiprocessPublisher:

multiprocessing.set_start_method("spawn") 

A more elegant solution is inheriting from:

multiprocessing.context.SpawnProcess

But that does not work and I don't understand why.

@Victorlouisdg
Copy link
Contributor Author

Another downside to "spawn" vs the default "fork" method, is that it can be much slower, as it copies the entire memory of the parent process (can be several seconds vs several ms). (However in this case this copy is probably what prevents the CUDA errors.)

@m-decoster
Copy link
Contributor

A more elegant solution is inheriting from:

multiprocessing.context.SpawnProcess

But that does not work and I don't understand why.

My experience is that for many libraries, including the Zed SDK, PyTorch, and open3d, you indeed need to use the spawn method, probably because they all depend on CUDA (e.g., using open3d's CPU functionality does work with fork). But, for me simply inheriting from multiprocessing.context.SpawnProcess does work, even when not using multiprocessing.set_start_method("spawn").

I'll be doing a lot of multiprocessing stuff in the near future, I'll report here if I find any issues when not using multiprocessing.set_start_method("spawn") .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants