-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firmware update keeps restarting #476
Comments
@jkandasa I've updated to latest snapshot release and also updated my gateway to latest mysensors library . Now the firmware updates do seem to be working but they are incredibly slow. It seems each block takes 3 seconds as if there is a deliberate delay added. Screenshot of resource log |
@seant100 it is all about node request. If you use MQTT gateway, you may feel a bit slow. |
@jkandasa The "Ack" enabled checkbox and the "Stream Ack" checkbox does not work at all. The logs fill up with errors about maximum retries. So I have to keep those disabled. One thing that seems to help was to downgrade the MySensors library to 2.2.0 as 2.3.0. This at least gets the packets transmitted faster again and far less restarts of firmware updates. I am not sure then if this is a MyController issue, a MySensors 2.3.0 issue or perhaps a MySBootloader issue |
@seant100 At this moment I'm having only dual optiboot bootloader with RFM69HW and MySensors 2.3.0, I do not see any issue with OTA. |
@jkandasa I am using MySBootloader with RF24 radios and updated gateway and a node to MySensors 2.3.0 and this combination seems to not work at all for firmware updates. Downgrading to MySensors 2.2.0 seems to work but I am still testing. |
@seant100 I guess |
@jkandasa I have enabled debug logging of gateway messages and see that MyController fails to send a packet and hence the firmware update is restarted.
|
@jkandasa CPU is periodically hitting 100% on raspberrypi. It does that for a 10+ seconds then drops down again to around 20/30% and stays in that range for a few minutes then back to 100% for probably 10+ seconds. If the time the CPU is 100% is too long then the firmware response is failing to be sent/processed in time. I suspected some sort of metric aggregation kicking in and reduced the "Raw data" retention to 1 minute. That seemed to help a lot but not 100% solve the problem. So now as a work-around I have unplugged the 2 nodes that push data every second (energy monitor node and weather node) and now the firmware update mostly working ... sometimes it is still restarting though. The end result of my findings here is that something mycontroller is doing is pushing the "java" process to hit 100% cpu usage and if that remains high too long (say more than about 10 seconds) then the firmware updater in MySBootloader times out and firmware update restarts. |
@seant100 nice finding! Looks like some performance hit on aggregation job or somewhere. |
@jkandasa I am running some scripts - they should not be doing anything major enough to cause this though. There are 4 scripts. 3 of them run when a distance measurement comes in from a water tank ultrasound sensor and calculates and sets 2 virtual sensors for percentage and litres. The other script runs every 30 minutes - checks some sensor values and determines whether or not to send a turn on command to a pump. So nothing here does any heavy processing justifying a 100% CPU and they don't run often. I suspected some java worker / background task - possibly data retention / aggregation. Hence trying with busy nodes turned off to reduce the time needed to aggregate any data that came in. This did seem to help reduce the time the CPU hits 100%. And firmware eventually did update after a few attempts with some restarts. When those nodes are turned on, the issue happens quicker in the firmware update progress and firmware is never updated. I only have the 2 busy nodes. The other nodes only report once every 2 minutes. So that should not generate enough data to justify 10 seconds of CPU at 100% doing the aggregation. |
@jkandasa Perhaps one option is to not do any aggregation jobs when a firmware update is in progress ! |
@seant100 disabling aggregation job might be super expensive when we enable it back. |
@jkandasa I have emailed the database backup to you |
The firmware update is restarting what seems to be every 1 minute.
From monitoring the resource logs, it seems to all be working fine then suddenly after what seems like 1 minute (every time its around 1 minute), the node sends a Firmware Request about 5 times for the same block. MyController does not send the packet and then the node restarts with a Firmware Config Request.
The text was updated successfully, but these errors were encountered: