Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docker compose for GPU #73

Merged
merged 2 commits into from
Jan 8, 2025
Merged

Update docker compose for GPU #73

merged 2 commits into from
Jan 8, 2025

Conversation

MikeWrock
Copy link
Collaborator

It seems that on my machine the instructions do not enable GPU support. Following them (uncommenting the docker-compose.yaml lines and moveit_pro building) I am unable to run nvidia-smi inside the container and no GPUs are found, but I do see the stage in which the nvidia drivers are installed when the image is built. If I leave the docker-compose file as it in in this PR (with the lines uncommented), I don't have to build anything differently and I can run nvidia-smi or nvtop and see my GPU is in use

@MikeWrock MikeWrock requested review from dsobek and nbbrooks January 1, 2025 23:06
Copy link

@dsobek dsobek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also revert the Dockerfile changes from #42?

@nbbrooks
Copy link
Member

nbbrooks commented Jan 2, 2025

IGNORE THIS - I was trying to test 2 PRs at once and clobbered the uncommenting changes I needed in the docker-compose.yaml here

=====

I checked 7.0 rc2, this branch of example_ws (with the gpu relevant lines uncommented), and built a new image.

lab_sim appears (to my eyes) to be running at a FPS (when dragging the visualization around dramatically) that indicates GPU acceleration is happening.

However, I cannot run nvidia smi in a pro shell (appears to be expected behavior) and when I run it on my host

$ nvidia-smi
Thu Jan  2 13:05:18 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1660        Off | 00000000:1D:00.0  On |                  N/A |
|  0%   44C    P5              15W / 130W |   2575MiB /  6144MiB |     29%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2308      G   /usr/lib/xorg/Xorg                         1140MiB |
|    0   N/A  N/A      2672      G   /usr/bin/gnome-shell                        333MiB |
|    0   N/A  N/A    318278      G   ...irefox/5361/usr/lib/firefox/firefox      233MiB |
|    0   N/A  N/A   1325487      G   ...erProcess --variations-seed-version      133MiB |
|    0   N/A  N/A   1325556      G   ...seed-version=20241225-174432.450000      730MiB |
+---------------------------------------------------------------------------------------+

The last two processes are VSCode and Chrome (ps -p PID).

None of those processes correspond to what I expected - I'm used to seeing a ros2_control process (which is running mujoco) listed in my old workflow where I manually install the nvidia driver in the container.

Should I be seeing the ros2_control process show up?

@nbbrooks
Copy link
Member

nbbrooks commented Jan 2, 2025

ok I built again with 7.0 rc2 and this branch (uncommented the gpu section). run fails

$ moveit_pro run -v
Running MoveIt Pro with Configuration: lab_sim
Starting services: ['drivers', 'agent_bridge', 'web_ui']
  - drivers
  - agent_bridge
  - web_ui
Waiting for MoveIt Pro to start.
[+] Running 4/3
 ✔ Container moveit_pro-web_ui-1                                     Created                                                                                                                                           0.4s 
 ✔ Container moveit_pro-agent_bridge-1                               Created                                                                                                                                           0.4s 
 ✔ Container moveit_pro-drivers-1                                    Created                                                                                                                                           0.4s 
 ! web_ui Published ports are discarded when using host network mode                                                                                                                                                   0.0s 
Attaching to agent_bridge-1, drivers-1, web_ui-1
web_ui-1        | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
web_ui-1        | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
web_ui-1        | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
web_ui-1        | 10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
web_ui-1        | 10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf differs from the packaged version
web_ui-1        | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
web_ui-1        | /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
web_ui-1        | /docker-entrypoint.sh: Configuration complete; ready for start up
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: using the "epoll" event method
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: nginx/1.23.1
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: built by gcc 10.2.1 20210110 (Debian 10.2.1-6) 
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: OS: Linux 6.8.0-49-generic
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker processes
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 30
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 31
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 32
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 33
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 34
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 35
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 36
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 37
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 38
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 39
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 40
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 41
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 42
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 43
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 44
web_ui-1        | 2025/01/02 20:15:27 [notice] 1#1: start worker process 45
Gracefully stopping... (press Ctrl+C again to force)
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]

MoveIt Pro shutdown initiated...

[+] Running 4/3
 ✔ Container moveit_pro-web_ui-1         Removed                                                                                                                                                                       1.8s 
 ✔ Container moveit_pro-agent_bridge-1   Removed                                                                                                                                                                       0.4s 
 ✔ Container moveit_pro-drivers-1        Removed                                                                                                                                                                       1.7s 
 ✔ Volume moveit_pro_ignition_resources  Removed             

@MikeWrock
Copy link
Collaborator Author

@nbbrooks You might need nvidia toolkit installed

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Test with:

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi   


ARG BASE
# hadolint ignore=DL3006
FROM base-${BASE} AS base
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsobek this is probably why @nbbrooks had docker build issues

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems likely.

@MikeWrock MikeWrock requested a review from dsobek January 6, 2025 23:38
@MikeWrock MikeWrock merged commit f65666c into main Jan 8, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants