Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On jetson orin nano some nvidia stats are not visible #286

Open
bogdanr opened this issue Nov 14, 2024 · 6 comments
Open

On jetson orin nano some nvidia stats are not visible #286

bogdanr opened this issue Nov 14, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@bogdanr
Copy link

bogdanr commented Nov 14, 2024

As it can be seen in the attached screenshot, only Orin (nvgpu) Usage is rendered. The other stats are show an infinite loading.

Screenshot 2024-11-14 at 22-27-50 Jetson _ Beszel

The debug log is this:

2024/11/14 22:28:01 DEBUG Getting stats
2024/11/14 22:28:01 DEBUG Skipping temperature collection
2024/11/14 22:28:01 DEBUG sysinfo data="{Hostname:jetson KernelVersion:5.15.148-tegra Cores:6 Threads:6 CpuModel:Cortex-A78AE Uptime:626630 Cpu:11.46 MemPct:39.99 DiskPct:48.23 Bandwidth:0.14 AgentVersion:0.8.0 Podman:false}"
2024/11/14 22:28:01 DEBUG System stats data="{Stats:{Cpu:11.46 MaxCpu:0 Mem:7.44 MemUsed:2.98 MemPct:39.99 MemBuffCache:4.33 MemZfsArc:0 Swap:3.72 SwapUsed:0.02 DiskTotal:232.24 DiskUsed:106.29 DiskPct:48.23 DiskReadPs:0 DiskWritePs:0.03 MaxDiskReadPs:0 MaxDiskWritePs:0 NetworkSent:0.01 NetworkRecv:0.13 MaxNetworkSent:0 MaxNetworkRecv:0 Temperatures:map[] ExtraFs:map[] GPUData:map[0:{Name:Orin (nvgpu) Temperature:0 MemoryUsed:0 MemoryTotal:0 Usage:0 Power:0 Count:16}]} Info:{Hostname:jetson KernelVersion:5.15.148-tegra Cores:6 Threads:6 CpuModel:Cortex-A78AE Uptime:626630 Cpu:11.46 MemPct:39.99 DiskPct:48.23 Bandwidth:0.14 AgentVersion:0.8.0 Podman:false} Containers:[]}"
2024/11/14 22:28:01 DEBUG Docker stats data="[0x4000284380 0x40002843f0 0x400041a070 0x40001b4a80 0x4000284620 0x400041a150 0x4000284700]"
2024/11/14 22:28:01 DEBUG Extra filesystems data=map[]

The output of nvidia-smi is this:

Thu Nov 14 22:30:22 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.4.0                Driver Version: 540.4.0      CUDA Version: 12.6     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Orin (nvgpu)                  N/A  | N/A              N/A |                  N/A |
| N/A   N/A  N/A               N/A /  N/A | Not Supported        |     N/A          N/A |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

or
nvidia-smi --query-gpu=utilization.gpu,temperature.gpu --format=csv this:

utilization.gpu [%], temperature.gpu
[N/A], [N/A]

I'd like to add that jtop can retrieve power and memory usage.

And also tegrastats produces this output:

11-14-2024 22:54:33 RAM 3722/7620MB (lfb 2x2MB) SWAP 554/3810MB (cached 1MB) CPU [10%@729,9%@729,11%@729,13%@729,5%@729,11%@729] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[305] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] VDD_IN 5088mW/5088mW VDD_CPU_GPU_CV 640mW/640mW VDD_SOC 1522mW/1522mW
@henrygd
Copy link
Owner

henrygd commented Nov 14, 2024

Thanks, I wasn't aware that nvidia-smi didn't work with Jetson devices. From a quick look, maybe tegrastats could work.

GPU usage - Seems to be GR3D_FREQ 0%

Power usage - VDD_IN 5088mW/5088mW VDD_CPU_GPU_CV 640mW/640mW VDD_SOC 1522mW/1522mW - I'm not entirely sure how to interpret this. Maybe VDD_IN is total power draw. VDD_CPU_GPU_CV seems too low to be the CPU / GPU share of power.

Does Jetson have dedicated memory for the GPU?

@henrygd henrygd added the enhancement New feature or request label Nov 14, 2024
@bogdanr
Copy link
Author

bogdanr commented Nov 15, 2024

Jetson devices have shared memory.

Here are the stats when it's doing inferencing:

11-15-2024 08:38:09 RAM 6185/7620MB (lfb 8x2MB) SWAP 851/3810MB (cached 1MB) CPU [15%@729,11%@729,14%@729,13%@729,11%@729,8%@729] EMC_FREQ 43%@2133 GR3D_FREQ 63%@[621] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] VDD_IN 12479mW/12479mW VDD_CPU_GPU_CV 4667mW/4667mW VDD_SOC 2817mW/2817mW

I am pretty sure VDD_IN is the total power consumption. The limit on this board is supposed to be 15W and while doing inferencing we're getting close to that.

@bogdanr
Copy link
Author

bogdanr commented Nov 15, 2024

Another very interesting statistic is EMC_FREQ 43%@2133. It shows the memory bandwidth utilization so basically you can identify if the bottleneck is the GPU speed or the memory speed.

@henrygd
Copy link
Owner

henrygd commented Nov 15, 2024

Seems like the best thing to do is hide the VRAM chart, because it's not applicable, and also the GPU power chart, because tegrastats doesn't or can't separate it from other usage.

Then we add a chart for total system power consumption along with GPU utilization from tegrastats.

Can you confirm that running tegrastats --interval 3000 leaves the command running and prints new info every three seconds? And if possible whether the stats are for that specific moment or the average since the last log? (You might want to increase the interval for the latter.)

@bogdanr
Copy link
Author

bogdanr commented Nov 19, 2024

Yes, specifying the interval leaves the command running. The stats are for that exact moment when the command runs, but the power stats also include an average, although I don't know what interval it's applied for the average.

@henrygd henrygd moved this to Done in Beszel Roadmap Feb 17, 2025
@henrygd henrygd closed this as completed by moving to Done in Beszel Roadmap Feb 17, 2025
@henrygd henrygd closed this as completed by moving to Done in Beszel Roadmap Feb 17, 2025
@henrygd henrygd moved this from Done to Possibly done in Beszel Roadmap Feb 17, 2025
@henrygd
Copy link
Owner

henrygd commented Feb 17, 2025

Sorry, added to the wrong column in the roadmap and the issue was automatically closed. Reopening.

This is possibly fixed in the next release but needs further confirmation since I don't have a Jetson device.

@henrygd henrygd reopened this Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Possibly done
Development

No branches or pull requests

2 participants