Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very intermittelty getting "Corrupted nv channel access file" #185

Open
geissonator opened this issue Sep 14, 2022 · 3 comments
Open

Very intermittelty getting "Corrupted nv channel access file" #185

geissonator opened this issue Sep 14, 2022 · 3 comments

Comments

@geissonator
Copy link
Contributor

We at IBM have seen this intermittently over the years. We've seen on our older witherspoon and mowgli systems (AST2500) but also on our new p10bmc machines (AST2600). It's very intermittent though.

The first symptom you see is this in the journal:

Sep 14 20:14:37 mowgli ipmid[430]: terminate called after throwing an instance of 'std::runtime_error'
Sep 14 20:14:37 mowgli ipmid[430]:   what():  Corrupted nv channel access file
Sep 14 20:14:38 mowgli systemd[1]: phosphor-ipmi-host.service: Main process exited, code=killed, status=6/ABRT
Sep 14 20:14:38 mowgli systemd[1]: phosphor-ipmi-host.service: Failed with result 'signal'.

When you look at the file in question, /var/lib/ipmi/channel_access_nv.json, it's 0 in size:

--w-------    1 root     root             0 Aug 25 14:51 /var/lib/ipmi/channel_access_nv.json

I'm not sure how this file could end up being 0 size, but it does seem like a simple workaround is in the error path, https://github.com/openbmc/phosphor-host-ipmid/blob/master/user_channel/channel_mgmt.cpp#L1146, to just remove the file. That way when ipmi restarts, it will just re-init the files. Thoughts? I can throw up a quick patch if it make sense.

@geissonator
Copy link
Contributor Author

Unless someone can pinpoint the bug causing the intermittent file size 0 issue, I think our best bet at this point is to at least gracefully recover from the error.

So either we should add a "else if" at https://github.com/openbmc/phosphor-host-ipmid/blob/master/user_channel/channel_mgmt.cpp#L1111 that confirms the returned "data" is non-zero in size (and deletes file and returns -EIO if it is invalid) or we should add code in the exception clauses to delete the invalid file. It may be best to do both.

In summary, If the file is 0 in size or throws an exception during parsing, delete the file and throw the exception.

Testing is simple, load your code change and make an empty size file and restart ipmid to ensure it recovers.

rm /var/lib/ipmi/channel_access_nv.json
touch /var/lib/ipmi/channel_access_nv.json
systemctl restart phosphor-ipmi-host.service

LuluTHSu added a commit to wistron-corporation/phosphor-host-ipmid that referenced this issue Sep 19, 2022
For unknown reasons the nv file size become to 0.
To not affect the service, add this condition:
If the file is 0 in size, delete the file and throw the exception.

Related Issue: openbmc/phosphor-host-ipmid#185

Tested:
	Make an empty size file, restart ipmid and confirm the recovery
	was successful.
$ rm /var/lib/ipmi/channel_access_nv.json
$ touch /var/lib/ipmi/channel_access_nv.json
$ systemctl restart phosphor-ipmi-host.service

Signed-off-by: LuluTHSu <[email protected]>
anoo1 pushed a commit to ibm-openbmc/phosphor-host-ipmid that referenced this issue Sep 20, 2022

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
For unknown reasons the nv file size become to 0.
To not affect the service, add this condition:
If the file is 0 in size, delete the file and throw the exception.

Related Issue: openbmc/phosphor-host-ipmid#185

Tested:
	Make an empty size file, restart ipmid and confirm the recovery
	was successful.
$ rm /var/lib/ipmi/channel_access_nv.json
$ touch /var/lib/ipmi/channel_access_nv.json
$ systemctl restart phosphor-ipmi-host.service

Signed-off-by: LuluTHSu <[email protected]>

Signed-off-by: LuluTHSu <[email protected]>
anoo1 pushed a commit to anoo1/phosphor-host-ipmid1 that referenced this issue Jun 4, 2024
For unknown reasons the nv file size become to 0.
To not affect the service, add this condition:
If the file is 0 in size, delete the file and throw the exception.

Related Issue: openbmc/phosphor-host-ipmid#185

Tested:
	Make an empty size file, restart ipmid and confirm the recovery
	was successful.
$ rm /var/lib/ipmi/channel_access_nv.json
$ touch /var/lib/ipmi/channel_access_nv.json
$ systemctl restart phosphor-ipmi-host.service

Signed-off-by: LuluTHSu <[email protected]>

Signed-off-by: LuluTHSu <[email protected]>
rfrandse pushed a commit to ibm-openbmc/phosphor-host-ipmid that referenced this issue Jun 4, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
For unknown reasons the nv file size become to 0.
To not affect the service, add this condition:
If the file is 0 in size, delete the file and throw the exception.

Related Issue: openbmc/phosphor-host-ipmid#185

Tested:
	Make an empty size file, restart ipmid and confirm the recovery
	was successful.
$ rm /var/lib/ipmi/channel_access_nv.json
$ touch /var/lib/ipmi/channel_access_nv.json
$ systemctl restart phosphor-ipmi-host.service

Signed-off-by: LuluTHSu <[email protected]>
Co-authored-by: Lulu_Su <[email protected]>
rfrandse pushed a commit to ibm-openbmc/phosphor-host-ipmid that referenced this issue Jun 4, 2024
For unknown reasons the nv file size become to 0.
To not affect the service, add this condition:
If the file is 0 in size, delete the file and throw the exception.

Related Issue: openbmc/phosphor-host-ipmid#185

Tested:
	Make an empty size file, restart ipmid and confirm the recovery
	was successful.
$ rm /var/lib/ipmi/channel_access_nv.json
$ touch /var/lib/ipmi/channel_access_nv.json
$ systemctl restart phosphor-ipmi-host.service

Signed-off-by: LuluTHSu <[email protected]>
Co-authored-by: Lulu_Su <[email protected]>
rfrandse pushed a commit to ibm-openbmc/phosphor-host-ipmid that referenced this issue Jun 4, 2024
For unknown reasons the nv file size become to 0.
To not affect the service, add this condition:
If the file is 0 in size, delete the file and throw the exception.

Related Issue: openbmc/phosphor-host-ipmid#185

Tested:
	Make an empty size file, restart ipmid and confirm the recovery
	was successful.
$ rm /var/lib/ipmi/channel_access_nv.json
$ touch /var/lib/ipmi/channel_access_nv.json
$ systemctl restart phosphor-ipmi-host.service

Signed-off-by: LuluTHSu <[email protected]>
Co-authored-by: Lulu_Su <[email protected]>
rfrandse pushed a commit to ibm-openbmc/phosphor-host-ipmid that referenced this issue Jun 4, 2024
For unknown reasons the nv file size become to 0.
To not affect the service, add this condition:
If the file is 0 in size, delete the file and throw the exception.

Related Issue: openbmc/phosphor-host-ipmid#185

Tested:
	Make an empty size file, restart ipmid and confirm the recovery
	was successful.
$ rm /var/lib/ipmi/channel_access_nv.json
$ touch /var/lib/ipmi/channel_access_nv.json
$ systemctl restart phosphor-ipmi-host.service

Signed-off-by: LuluTHSu <[email protected]>
Co-authored-by: Lulu_Su <[email protected]>
anoo1 pushed a commit to anoo1/phosphor-host-ipmid1 that referenced this issue Jun 5, 2024
For unknown reasons the nv file size become to 0.
To not affect the service, add this condition:
If the file is 0 in size, delete the file and throw the exception.

Related Issue: openbmc/phosphor-host-ipmid#185

Tested:
	Make an empty size file, restart ipmid and confirm the recovery
	was successful.
$ rm /var/lib/ipmi/channel_access_nv.json
$ touch /var/lib/ipmi/channel_access_nv.json
$ systemctl restart phosphor-ipmi-host.service

Signed-off-by: LuluTHSu <[email protected]>

Signed-off-by: LuluTHSu <[email protected]>
anoo1 added a commit to ibm-openbmc/phosphor-host-ipmid that referenced this issue Jun 5, 2024
For unknown reasons the nv file size become to 0.
To not affect the service, add this condition:
If the file is 0 in size, delete the file and throw the exception.

Related Issue: openbmc/phosphor-host-ipmid#185

Tested:
	Make an empty size file, restart ipmid and confirm the recovery
	was successful.
$ rm /var/lib/ipmi/channel_access_nv.json
$ touch /var/lib/ipmi/channel_access_nv.json
$ systemctl restart phosphor-ipmi-host.service

Signed-off-by: LuluTHSu <[email protected]>

Signed-off-by: LuluTHSu <[email protected]>
@icmiao-msft
Copy link

@geissonator May I know what physical storage you are using for filesystem? flash part or eMMC? TIA

@geissonator
Copy link
Contributor Author

@geissonator May I know what physical storage you are using for filesystem? flash part or eMMC? TIA

We've seen this on both AST2500 (NOR chip) and AST2600 (eMMC). It recently resurfaced in our latest release on an AST2600.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants