Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA 13.1 reboots nearly every hour, freeze blocks actions #3569

Closed
darth-hp opened this issue Aug 30, 2024 · 8 comments
Closed

HA 13.1 reboots nearly every hour, freeze blocks actions #3569

darth-hp opened this issue Aug 30, 2024 · 8 comments
Labels
board/ova Open Virtual Appliance (Virtual Machine) bug hypervisor/proxmox Proxmox related issues

Comments

@darth-hp
Copy link

Describe the issue you are experiencing

During recent days I had some "blips" eg by controlling lights but I did not recognize a pattern until yesterday. Then I checked my system and saw that it restarts regularly - like every hour.
I see one recurring pattern in the journals:

Aug 29 01:45:04 homeassistant.fritz.box qemu-ga[384]: info: guest-fsfreeze called
Aug 29 01:45:04 homeassistant.fritz.box qemu-ga[384]: info: executing fsfreeze hook with arg 'freeze'
Aug 29 01:45:05 homeassistant.fritz.box hassio_supervisor[486]: 2024-08-29 01:45:05.015 INFO (MainThread) [supervisor.backups.manager] Freeze starting stage home_assistant

Then a new journal starts.

I tried downgrading HAOS and that gave me Error: 'OSManager.update' blocked from execution, system is not running - freeze

I will reboot to unthaw and then retry with 13.0

What operating system image do you use?

ova (for Virtual Machines)

What version of Home Assistant Operating System is installed?

13.1

Did the problem occur after upgrading the Operating System?

Yes

Hardware details

OVA running on proxmox 8.2

Steps to reproduce the issue

  1. Install HAOS 13.1
  2. Wait for freeze

...

Anything in the Supervisor logs that might be useful for us?

See above

Anything in the Host logs that might be useful for us?

See above

System information

System Information

version core-2024.8.3
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.4
os_name Linux
os_version 6.6.46-haos
arch x86_64
timezone Europe/Berlin
config_dir /config
Home Assistant Community Store
GitHub API ok
GitHub Content ok
GitHub Web ok
HACS Data ok
GitHub API Calls Remaining 4967
Installed Version 2.0.0
Stage running
Available Repositories 1461
Downloaded Repositories 17
AccuWeather
error failed to load: unknown
Home Assistant Cloud
logged_in false
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 13.1
update_channel stable
supervisor_version supervisor-2024.08.0
agent_version 1.6.0
docker_version 26.1.4
disk_total 62.3 GB
disk_used 23.9 GB
healthy true
supported true
host_connectivity true
supervisor_connectivity true
ntp_synchronized true
virtualization kvm
board ova
supervisor_api ok
version_api ok
installed_addons ESPHome (2024.8.1), Duck DNS (1.18.0), NGINX Home Assistant SSL proxy (3.10.1), Studio Code Server (5.15.0), Advanced SSH & Web Terminal (18.0.0), AppDaemon (0.16.6), TasmoAdmin (0.30.4), Mosquitto broker (6.4.1), Matter Server (6.4.1)
Dashboards
dashboards 1
resources 3
views 14
mode storage
Recorder
oldest_recorder_run 22. August 2024 um 08:16
current_recorder_run 30. August 2024 um 11:32
estimated_db_size 1460.44 MiB
database_engine sqlite
database_version 3.45.3
Spotify
api_endpoint_reachable ok

Additional information

No response

@darth-hp darth-hp added the bug label Aug 30, 2024
@darth-hp
Copy link
Author

darth-hp commented Sep 1, 2024

Installed HAOS 13.0 with the same problem. Now went back to 12.4

@sairon sairon added board/ova Open Virtual Appliance (Virtual Machine) hypervisor/proxmox Proxmox related issues labels Sep 2, 2024
@sairon
Copy link
Member

sairon commented Sep 2, 2024

I don't see any change that could fundamentally affect this in OS 13.0 (unless it's a downfall of some kernel regression) 🤔 And if I'm not mistaken, there should be a 10 minutes timeout for the freeze. Couple of things you can check/try - connect to the VM console, type login and check journalctl -f -u qemu-guest.service. You can check if the freeze/thaw hooks run correctly with /usr/libexec/haos-freeze-hook freeze; echo $? and /usr/libexec/haos-freeze-hook thaw; echo $? (always remember to call thaw, otherwise you'll end up with frozen database). Check ha su logs afterwards.

Also, how big is your home-assistant_v2.db, and the whole configuration folder?

@darth-hp
Copy link
Author

darth-hp commented Sep 2, 2024

@sairon thank you for the hints. I tried to follow supervisor logs using ha supervisor logs -f and again the last thing I see is 2024-09-02 15:15:06.202 INFO (MainThread) [supervisor.backups.manager] Freeze starting stage home_assistant before it reboots.

Using /usr/libexec/haos-freeze-hook freeze; echo $? and /usr/libexec/haos-freeze-hook thaw; echo $? I either get this:
image
this
image
or this
image

It never results in both commands returning 0

EDIT:
home-assistant_v2.db is 1.5G and the rest ist around 100MB

@darth-hp
Copy link
Author

darth-hp commented Sep 3, 2024

I went back to 13.1 and purged the home-assistant_v2.db and the issue still persists.

@darth-hp
Copy link
Author

darth-hp commented Sep 5, 2024

I installed a fresh HAOS in promox with the same MAC and restored a backup. It's running since three hours without a crash. Let's see what the next hours/days will show

@darth-hp
Copy link
Author

darth-hp commented Sep 6, 2024

To bad - at first sight it was at least running 3+ hours but now back to reboots. All I recognize that freeze/thaw is happening. While thinking about this ... I have a 15 minutes schedule to replicate within my proxmox instances. I'll adjust that - maybe that is to many.

@darth-hp
Copy link
Author

darth-hp commented Sep 7, 2024

Okay - indeed that was it! A 15 minutes sync backup broke the thaw/freeze or whatever. Unfortunately no real error was visible but the system decided to reboot at some point in time.

@darth-hp darth-hp closed this as completed Sep 7, 2024
@darth-hp
Copy link
Author

One additional comment.

Probably this was caused by "Proxmox VE Monitor-All" (ping-instances) - unfortunately I can't check because I removed that proxmox node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
board/ova Open Virtual Appliance (Virtual Machine) bug hypervisor/proxmox Proxmox related issues
Projects
None yet
Development

No branches or pull requests

2 participants