Can it run on a Linux system with A100 GPUs? #18

RZFan525 · 2024-07-25T07:49:32Z

No description provided.

xuanlinli17 · 2024-07-25T13:17:48Z

Yes. Please follow the instructions in readme and troubleshooting. Though, rendering for the drawer tasks will be slow due to the use of ray tracing.

RZFan525 · 2024-07-26T02:49:16Z

Thank you for your reply. However, I encountered the same error as #7. And, when I install vulkan-utils with sudo apt-get install vulkan-utils, an error appears:
The package vulkan-utils could not be located
I don't have any computers with RTX GPUs, how can I run it?

RZFan525 · 2024-07-26T03:27:41Z

I have followed https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/installation.html#vulkan to add three json files, but it does not work.

xuanlinli17 · 2024-07-27T20:19:30Z

Did you sudo apt update and vulkan-utils is still not found since it's ubuntu 22.04?

Try sudo apt install vulkan-tools

RZFan525 · 2024-07-28T03:59:50Z

Thank you for your reply. I have tried it, and it can be installed successfully. However, the same error has appeared.

And, I found that vulkaninfo works without /usr/share/vulkan/icd.d/nvidia_icd.json, /usr/share/glvnd/egl_vendor.d/10_nvidia.json, and /etc/vulkan/implicit_layer.d/nvidia_layers.json. But, when I follow https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/installation.html#vulkan to manually add these three files, vulkaninfo doesn't work with the error ERROR_OUT_OF_HOST_MEMORY.

Anyway, the following error always appears whether the vulkaninfo can work or not.

[2024-07-28 11:58:15.019] [svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[2024-07-28 11:58:15.019] [svulkan2] [warning] Continue without GLFW.
Traceback (most recent call last):
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv-OpenVLA/test.py", line 4, in <module>
    env = simpler_env.make('google_robot_pick_coke_can')
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv-OpenVLA/simpler_env/__init__.py", line 78, in make
    env = gym.make(env_name, obs_mode="rgbd", **kwargs)
  File "/cpfs01/user/liupengfei/rzfan/miniconda3/envs/simpler_env/lib/python3.10/site-packages/gymnasium/envs/registration.py", line 802, in make
    env = env_creator(**env_spec_kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/utils/registration.py", line 92, in make
    env = env_spec.make(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/utils/registration.py", line 34, in make
    return self.cls(**_kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/custom_scenes/grasp_single_in_scene.py", line 630, in __init__
    super().__init__(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/custom_scenes/grasp_single_in_scene.py", line 540, in __init__
    super().__init__(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/custom_scenes/grasp_single_in_scene.py", line 64, in __init__
    super().__init__(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/custom_scenes/base_env.py", line 134, in __init__
    super().__init__(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/sapien_env.py", line 107, in __init__
    self._renderer = sapien.SapienRenderer(**renderer_kwargs)
RuntimeError: vk::Instance::enumeratePhysicalDevices: ErrorInitializationFailed

RZFan525 · 2024-07-28T04:03:37Z

I don't know how to run it :(

I have tried three different servers with A100 GPUs which encounter the same error. :(

xuanlinli17 · 2024-07-29T17:09:02Z

Are you setting cuda devices properly? Also ensure that nvidia-driver version is at least above 535. Older nvidia drivers might not work.

You can make a fake display like

tmux new -s 1
sudo X :0 &
[exist tmux ctrl-b]
export DISPLAY=:0

RZFan525 · 2024-07-30T03:22:19Z

Thank you for getting back to me.

The servers I used are in a docker and I changed to another server, which makes it work.

However, I encountered another error which is attributed to the lack of display.

RuntimeError: Create window failed: context is not created with present support

Do you have any suggestions to help me observe the environment and the process of action?

xuanlinli17 · 2024-07-30T03:50:48Z

Inside docker, you might want to port the (fake) display (e.g., sudo X :0 &) in the main bash to the docker container

However, the SIMPLER environments shouldn't create a window unless you are visualizing robots using the utility scripts.

RZFan525 · 2024-07-30T03:54:00Z

I'm new in robotics, so I want to visualize the simulation environment to help me understand deeply. Maybe, it's better to output a video.

xuanlinli17 · 2024-07-30T05:28:08Z

The evaluation videos are automatically saved.

RZFan525 · 2024-07-30T08:12:18Z

Thank you!

I can run the scripts scripts/openvla_bridge.sh, but it suddenly reports an error after running for a while.

xuanlinli17 · 2024-07-30T16:19:17Z

If you consecutively create 2 environments in ipython, does it still report an error?

RZFan525 · 2024-07-31T03:21:09Z

When I create 2 environments, it can work. But there is a warning:

[2024-07-31 03:20:06.870] [svulkan2] [warning] A second renderer will share the same internal context with the first one. Arguments passed to constructor will be ignored.

RZFan525 · 2024-07-31T03:36:46Z

i don't know why. But, I also try SimplerEnv-OpenVLA/scripts/openvla_drawer_variant_agg.sh It's successful to output the average success

Thank you!

I can run the scripts scripts/openvla_bridge.sh, but it suddenly reports an error after running for a while.

RZFan525 · 2024-08-01T09:40:49Z

I find that the error appears when the obj_episode_id is 11 in any scripts that define obj-variation-mode as the episode.

xuanlinli17 · 2024-08-01T13:58:13Z

That's strange; episode 11 doesn't introduce new objects.

RZFan525 · 2024-08-02T05:21:02Z

Could you give me some instructions on how to debug? Thank you very much!!

xuanlinli17 · 2024-08-02T05:33:35Z

I actually don't know... and sorry that I don't have much bandwidth at the moment to look closely.

RZFan525 · 2024-08-02T05:35:26Z

Ok. Thank you for your reply.

xuanlinli17 · 2024-08-02T19:33:00Z

Also you might create fake display like sudo X :0 &; export DISPLAY=:0 or xvfb-run -a {script}, to see if it works.

RZFan525 · 2024-08-03T03:04:45Z

Thank you. After trying this command, I found it cannot work. The error is the same. I don't know why.

COST-97 · 2024-09-18T07:47:43Z

Hello:

I don't know how to run it :(

I have tried three different servers with A100 GPUs which encounter the same error. :(

Same error in A100 GPU.
"libGLX_nvidia.so.0" does not exist in the A100.

Does anyone have an updated solution?
Thanks a lot!

xuanlinli17 · 2024-09-18T15:04:33Z

Hello:

I don't know how to run it :(
I have tried three different servers with A100 GPUs which encounter the same error. :(

Same error in A100 GPU. "libGLX_nvidia.so.0" does not exist in the A100.

Does anyone have an updated solution? Thanks a lot!

Could you try the troubleshooting section in readme?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can it run on a Linux system with A100 GPUs? #18

Can it run on a Linux system with A100 GPUs? #18

RZFan525 commented Jul 25, 2024

xuanlinli17 commented Jul 25, 2024

RZFan525 commented Jul 26, 2024 •

edited

Loading

RZFan525 commented Jul 26, 2024

xuanlinli17 commented Jul 27, 2024 •

edited

Loading

RZFan525 commented Jul 28, 2024

RZFan525 commented Jul 28, 2024

xuanlinli17 commented Jul 29, 2024 •

edited

Loading

RZFan525 commented Jul 30, 2024

xuanlinli17 commented Jul 30, 2024 •

edited

Loading

RZFan525 commented Jul 30, 2024

xuanlinli17 commented Jul 30, 2024

RZFan525 commented Jul 30, 2024

xuanlinli17 commented Jul 30, 2024

RZFan525 commented Jul 31, 2024

RZFan525 commented Jul 31, 2024

RZFan525 commented Aug 1, 2024

xuanlinli17 commented Aug 1, 2024

RZFan525 commented Aug 2, 2024

xuanlinli17 commented Aug 2, 2024

RZFan525 commented Aug 2, 2024

xuanlinli17 commented Aug 2, 2024

RZFan525 commented Aug 3, 2024

COST-97 commented Sep 18, 2024

xuanlinli17 commented Sep 18, 2024

Can it run on a Linux system with A100 GPUs? #18

Can it run on a Linux system with A100 GPUs? #18

Comments

RZFan525 commented Jul 25, 2024

xuanlinli17 commented Jul 25, 2024

RZFan525 commented Jul 26, 2024 • edited Loading

RZFan525 commented Jul 26, 2024

xuanlinli17 commented Jul 27, 2024 • edited Loading

RZFan525 commented Jul 28, 2024

RZFan525 commented Jul 28, 2024

xuanlinli17 commented Jul 29, 2024 • edited Loading

RZFan525 commented Jul 30, 2024

xuanlinli17 commented Jul 30, 2024 • edited Loading

RZFan525 commented Jul 30, 2024

xuanlinli17 commented Jul 30, 2024

RZFan525 commented Jul 30, 2024

xuanlinli17 commented Jul 30, 2024

RZFan525 commented Jul 31, 2024

RZFan525 commented Jul 31, 2024

RZFan525 commented Aug 1, 2024

xuanlinli17 commented Aug 1, 2024

RZFan525 commented Aug 2, 2024

xuanlinli17 commented Aug 2, 2024

RZFan525 commented Aug 2, 2024

xuanlinli17 commented Aug 2, 2024

RZFan525 commented Aug 3, 2024

COST-97 commented Sep 18, 2024

xuanlinli17 commented Sep 18, 2024

RZFan525 commented Jul 26, 2024 •

edited

Loading

xuanlinli17 commented Jul 27, 2024 •

edited

Loading

xuanlinli17 commented Jul 29, 2024 •

edited

Loading

xuanlinli17 commented Jul 30, 2024 •

edited

Loading