-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very intermittelty getting "Corrupted nv channel access file" #185
Comments
Unless someone can pinpoint the bug causing the intermittent file size 0 issue, I think our best bet at this point is to at least gracefully recover from the error. So either we should add a "else if" at https://github.com/openbmc/phosphor-host-ipmid/blob/master/user_channel/channel_mgmt.cpp#L1111 that confirms the returned "data" is non-zero in size (and deletes file and returns -EIO if it is invalid) or we should add code in the exception clauses to delete the invalid file. It may be best to do both. In summary, If the file is 0 in size or throws an exception during parsing, delete the file and throw the exception. Testing is simple, load your code change and make an empty size file and restart ipmid to ensure it recovers.
|
For unknown reasons the nv file size become to 0. To not affect the service, add this condition: If the file is 0 in size, delete the file and throw the exception. Related Issue: openbmc/phosphor-host-ipmid#185 Tested: Make an empty size file, restart ipmid and confirm the recovery was successful. $ rm /var/lib/ipmi/channel_access_nv.json $ touch /var/lib/ipmi/channel_access_nv.json $ systemctl restart phosphor-ipmi-host.service Signed-off-by: LuluTHSu <[email protected]>
For unknown reasons the nv file size become to 0. To not affect the service, add this condition: If the file is 0 in size, delete the file and throw the exception. Related Issue: openbmc/phosphor-host-ipmid#185 Tested: Make an empty size file, restart ipmid and confirm the recovery was successful. $ rm /var/lib/ipmi/channel_access_nv.json $ touch /var/lib/ipmi/channel_access_nv.json $ systemctl restart phosphor-ipmi-host.service Signed-off-by: LuluTHSu <[email protected]> Signed-off-by: LuluTHSu <[email protected]>
For unknown reasons the nv file size become to 0. To not affect the service, add this condition: If the file is 0 in size, delete the file and throw the exception. Related Issue: openbmc/phosphor-host-ipmid#185 Tested: Make an empty size file, restart ipmid and confirm the recovery was successful. $ rm /var/lib/ipmi/channel_access_nv.json $ touch /var/lib/ipmi/channel_access_nv.json $ systemctl restart phosphor-ipmi-host.service Signed-off-by: LuluTHSu <[email protected]> Signed-off-by: LuluTHSu <[email protected]>
For unknown reasons the nv file size become to 0. To not affect the service, add this condition: If the file is 0 in size, delete the file and throw the exception. Related Issue: openbmc/phosphor-host-ipmid#185 Tested: Make an empty size file, restart ipmid and confirm the recovery was successful. $ rm /var/lib/ipmi/channel_access_nv.json $ touch /var/lib/ipmi/channel_access_nv.json $ systemctl restart phosphor-ipmi-host.service Signed-off-by: LuluTHSu <[email protected]> Co-authored-by: Lulu_Su <[email protected]>
For unknown reasons the nv file size become to 0. To not affect the service, add this condition: If the file is 0 in size, delete the file and throw the exception. Related Issue: openbmc/phosphor-host-ipmid#185 Tested: Make an empty size file, restart ipmid and confirm the recovery was successful. $ rm /var/lib/ipmi/channel_access_nv.json $ touch /var/lib/ipmi/channel_access_nv.json $ systemctl restart phosphor-ipmi-host.service Signed-off-by: LuluTHSu <[email protected]> Co-authored-by: Lulu_Su <[email protected]>
For unknown reasons the nv file size become to 0. To not affect the service, add this condition: If the file is 0 in size, delete the file and throw the exception. Related Issue: openbmc/phosphor-host-ipmid#185 Tested: Make an empty size file, restart ipmid and confirm the recovery was successful. $ rm /var/lib/ipmi/channel_access_nv.json $ touch /var/lib/ipmi/channel_access_nv.json $ systemctl restart phosphor-ipmi-host.service Signed-off-by: LuluTHSu <[email protected]> Co-authored-by: Lulu_Su <[email protected]>
For unknown reasons the nv file size become to 0. To not affect the service, add this condition: If the file is 0 in size, delete the file and throw the exception. Related Issue: openbmc/phosphor-host-ipmid#185 Tested: Make an empty size file, restart ipmid and confirm the recovery was successful. $ rm /var/lib/ipmi/channel_access_nv.json $ touch /var/lib/ipmi/channel_access_nv.json $ systemctl restart phosphor-ipmi-host.service Signed-off-by: LuluTHSu <[email protected]> Co-authored-by: Lulu_Su <[email protected]>
For unknown reasons the nv file size become to 0. To not affect the service, add this condition: If the file is 0 in size, delete the file and throw the exception. Related Issue: openbmc/phosphor-host-ipmid#185 Tested: Make an empty size file, restart ipmid and confirm the recovery was successful. $ rm /var/lib/ipmi/channel_access_nv.json $ touch /var/lib/ipmi/channel_access_nv.json $ systemctl restart phosphor-ipmi-host.service Signed-off-by: LuluTHSu <[email protected]> Signed-off-by: LuluTHSu <[email protected]>
For unknown reasons the nv file size become to 0. To not affect the service, add this condition: If the file is 0 in size, delete the file and throw the exception. Related Issue: openbmc/phosphor-host-ipmid#185 Tested: Make an empty size file, restart ipmid and confirm the recovery was successful. $ rm /var/lib/ipmi/channel_access_nv.json $ touch /var/lib/ipmi/channel_access_nv.json $ systemctl restart phosphor-ipmi-host.service Signed-off-by: LuluTHSu <[email protected]> Signed-off-by: LuluTHSu <[email protected]>
@geissonator May I know what physical storage you are using for filesystem? flash part or eMMC? TIA |
We've seen this on both AST2500 (NOR chip) and AST2600 (eMMC). It recently resurfaced in our latest release on an AST2600. |
We at IBM have seen this intermittently over the years. We've seen on our older witherspoon and mowgli systems (AST2500) but also on our new p10bmc machines (AST2600). It's very intermittent though.
The first symptom you see is this in the journal:
When you look at the file in question, /var/lib/ipmi/channel_access_nv.json, it's 0 in size:
I'm not sure how this file could end up being 0 size, but it does seem like a simple workaround is in the error path, https://github.com/openbmc/phosphor-host-ipmid/blob/master/user_channel/channel_mgmt.cpp#L1146, to just remove the file. That way when ipmi restarts, it will just re-init the files. Thoughts? I can throw up a quick patch if it make sense.
The text was updated successfully, but these errors were encountered: