Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong GPU total memory reported #371

Open
nitroxis opened this issue May 20, 2023 · 15 comments
Open

Wrong GPU total memory reported #371

nitroxis opened this issue May 20, 2023 · 15 comments

Comments

@nitroxis
Copy link

nitroxis commented May 20, 2023

Hi, I just noticed that OhmGraphite reports an incorrect total GPU memory size when there are multiple GPUs.

OhmGraphite's prometheus endpoint reports:

ohm_gpunvidia_bytes{hardware="NVIDIA GeForce GTX 1060 6GB",sensor="GPU Memory Total",hw_instance="0"} 11811160064
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce RTX 2080 Ti",sensor="GPU Memory Total",hw_instance="1"} 11811160064

whereas the "1060 6GB" should have 6GB, as the name implies. It shows up correctly in LibreHardwareMonitor, so this does not appear to be the cause:
image

@nickbabcock
Copy link
Owner

Thanks for the bug report! Couple of questions to help narrow in on the problem:

  • Is it just the memory total sensor that OhmGraphite reports as the same value?
  • What LibreHardwareMonitor version are you using?

@nitroxis
Copy link
Author

The screenshot was made with the current release version from their GitHub (v0.9.2).
I've checked again - it is indeed all three GPU Memory ... metrics that are the same. Here is the full list of ohm_gpunvidia_bytes:

# HELP ohm_gpunvidia_bytes Metric reported by open hardware sensor
# TYPE ohm_gpunvidia_bytes gauge
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce RTX 2080 Ti",sensor="GPU Memory Free",hw_instance="0"} 11092885504
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce GTX 1060 6GB",sensor="D3D Shared Memory Used",hw_instance="1"} 155197440
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce GTX 1060 6GB",sensor="GPU Memory Used",hw_instance="1"} 717225984
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce GTX 1060 6GB",sensor="GPU Memory Total",hw_instance="1"} 11811160064
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce RTX 2080 Ti",sensor="GPU Memory Total",hw_instance="0"} 11811160064
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce RTX 2080 Ti",sensor="GPU Memory Used",hw_instance="0"} 717225984
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce GTX 1060 6GB",sensor="GPU Memory Free",hw_instance="1"} 11092885504
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce GTX 1060 6GB",sensor="D3D Dedicated Memory Used",hw_instance="1"} 1205383168
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce RTX 2080 Ti",sensor="D3D Dedicated Memory Used",hw_instance="0"} 489439232
ohm_gpunvidia_bytes{hardware="NVIDIA GeForce RTX 2080 Ti",sensor="D3D Shared Memory Used",hw_instance="0"} 110133248

@nitroxis
Copy link
Author

The other ohm_gpunvidia_... metrics appear to be working correctly.

@nickbabcock
Copy link
Owner

nickbabcock commented May 20, 2023

One thing you can try is the nightly build of OhmGraphite built with LibreHardwareMonitor 0.9.2 (https://github.com/nickbabcock/OhmGraphite/suites/11729082221/artifacts/610719590)

If that doesn't fix things, are other sensors like load, wattage, and fans duplicated too? Got it

@nitroxis
Copy link
Author

The nightly build still has this issue.

@nitroxis
Copy link
Author

Strange, if I compile it myself and launch it in the debugger, it works fine.

@nickbabcock
Copy link
Owner

Strange, if I compile it myself and launch it in the debugger, it works fine.

When you compile and run OhmGraphite yourself, it works!? 😨

That completely stumps me.

Copied below is a bit of an investigation that I went on, but if compiling it yourself works, then it can be ignored.


My best guess is that there's a difference in how LibreHardwareMonitor and OhmGraphite are refreshing sensors. OhmGraphite refreshes all hardware whenever it needs to send out new metrics. I can see that if LibreHardwareMonitor batches the refresh and UI update for each hardware component before going onto the next component, it would sidestep the possibility of a hardware sensors relying on a global value.

I feel like this is partially corroborated by the fact that it is only the memory sensors that use a display handle instead of a physical handle: https://github.com/LibreHardwareMonitor/LibreHardwareMonitor/blob/6066b1a79737bb7e23217f0d2bb1b14fab04b9aa/LibreHardwareMonitorLib/Hardware/Gpu/NvidiaGpu.cs#L967

@nickbabcock
Copy link
Owner

I wonder, if you execute:

dotnet publish -c Release .\OhmGraphite\

And run the resulting zip, if that'll also show the problem.

@nitroxis
Copy link
Author

nitroxis commented May 25, 2023

I've looked into it a bit more and it appears that it is related to whether the program runs as a normal process or as a service. Running it with OhmGraphite.exe run yields correct results, running it as a service (e.g. OhmGraphite.exe start) yields the wrong results.

@nickbabcock
Copy link
Owner

Thanks for looking into it further. This issue looks like a variant of #153 (there are various possible solutions within that thread (like #153 (comment)), though the user ultimately went with the workaround in #153 (comment)). Their issue involved an AMD GPU, not Nvidia, yet seems eerily similar.

@nitroxis
Copy link
Author

It might be related, though it is strange that all other NVIDIA metrics appear to be working fine, it is only those 3 that are wrong. If it were some kind of permission/session thing, I would've thought either all metrics work, or none (like in the linked issue). Why only the memory metrics, and only for one GPU? Checking the "Interact with desktop" checkbox makes no difference for me. I don't really know how to investigate this further.

@roy-spark
Copy link
Contributor

Are these still problems that are persisting in 0.3x? (Issues are not closed)

What are the workarounds in that case?

@roy-spark
Copy link
Contributor

I changed to run OhmGraphihte from service to "OhmGraphite run" and finally it reported GPU load percentage. (It was constantly zero when running in service mode)

@nickbabcock
Copy link
Owner

Thanks for confirming. Looks like this issue is decently widespread. I'm not sure what causes the issue or what the fix is. Now that OhmGraphite recently started targeting .net 6, it looks like there is an easy and official way to create windows services that doesn't rely on a 3rd party library: https://learn.microsoft.com/en-us/dotnet/core/extensions/windows-service?pivots=dotnet-6-0

I might poke at it and see if it's viable and fixes issues.

@nickbabcock
Copy link
Owner

Since OhmGraphite v0.31, the old windows service library has been replaced with the newer, official microsoft implementation. Let me know if this fixes the situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants