-
-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: GPU Stats suddenly vanishing #558
Comments
Thanks, very strange since it was working previously and we haven't released a new version. If you have multiple machines with GPUs, did they all vanish or was it only one machine? Please try running the agent with env var If there's a problem initializing the GPU functionality, it should print Also please run the command below for a minute. Make sure the formatting is consistent and it doesn't quit on its own. Also paste one of the lines here so I can see. nvidia-smi -l 4 --query-gpu=index,name,temperature.gpu,memory.used,memory.total,utilization.gpu,power.draw --format=csv,noheader,nounits |
logs from the agent:
nvidia-smi log:
so yea... the agent suddenly doesnt find it. ill try to get data from another agent where that suddenly began happening guess we can keep it to the ubuntu one, as im not exactly sure how to check for logs of a service running on Windows (atleast no simple way) |
I don't know if I have encountered this problem as well. I am a newcomer who just installed beszel. I installed the hub on a public server and then installed the agent on two A100 servers. Both agents were installed using the script for the binary version, not the docker version. However, I did not see any information about the GPU. I added Environment="GPU=true" to the service configuration file on one server, and after restarting the agent service, there is still no GPU information output. If you need more information, I am happy to provide it. |
Having the same issue. Cannot seem to get GPU info from a PopOS machine with a 2080ti installed and working fine with |
I started a discussion in #563 for anyone having problems with GPU stats. There's a small program there which should help figure out what's going wrong. |
Update with possible solution here: #563 (reply in thread) |
Description
as per this comment: #262 (comment)
the little log i get which doesn't mention a gpu at all... just restarted the service to see if i get any particular error listed, from a linux machine (Ubuntu 24.04 LTS), running the binary.
Ideally it should show 2 4090 GPUs (atleast on this machine)
As already mentioned, i did see the entries for a few days, but they suddenly vanished. i don't know where to start debugging this, especially since it happens across multiple devices.
If its needed to know: i run the hub on a docker container on a public server, and use tailscale to get the agents of my homelab listed.
Expected Behavior
To see all my GPUs listed per client (agent)
Steps to Reproduce
Sadly, im not sure. it suddenly disappeared without a trace.
OS / Architecture
Ubuntu 24.04 / AMD64
Beszel version
0.9.1
Installation method
Docker
Configuration
Hub Logs
Agent Logs
The text was updated successfully, but these errors were encountered: