Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rasdaemon: ipmitool SEL logging of AER CEs on OpenBMC platforms #97

Closed
wants to merge 7 commits into from

Conversation

krishnanadh
Copy link
Contributor

Log to OpenBMC SEL logs, all the AER correctable errors that are handled by the kernel. The IPMI command record fields used are defined in the IPMI specification v2.0. Non-timestamped record type is used here given that the BMC will attach one at the time of logging.

Test Plan: Tested on OpenBMC platforms using EINJ.

Signed-off-by: Krishna Dhulipala [email protected]
Reviewed-by: Ril Van Riel [email protected]

Log to OpenBMC SEL logs, all the AER correctable errors that
are handled by the kernel. The IPMI command record fields used are
defined in the IPMI specificaton v2.0. Non-timestamped record type is
used here given that the BMC will attach one at the time of logging.

Test Plan: Tested on OpenBMC platforms using EINJ.

Signed-off-by: Krishna Dhulipala <[email protected]>
Reviewed-by: Ril Van Riel <[email protected]>
@krishnanadh
Copy link
Contributor Author

@mchehab, Can you please help review the PR?

Log to OpenBMC SEL logs, all the AER correctable errors that
are handled by the kernel. The IPMI command record fields used are
defined in the IPMI specificaton v2.0. Non-timestamped record type is
used here given that the BMC will attach one at the time of logging.

Test Plan: Tested on OpenBMC platforms using EINJ.

Signed-off-by: Krishna Dhulipala <[email protected]>
Reviewed-by: Ril Van Riel <[email protected]>
Log to OpenBMC SEL logs, all the AER correctable errors that
are handled by the kernel. The IPMI command record fields used are
defined in the IPMI specificaton v2.0. Non-timestamped record type is
used here given that the BMC will attach one at the time of logging.

Test Plan: Tested on OpenBMC platforms using EINJ.

Signed-off-by: Krishna Dhulipala <[email protected]>
Reviewed-by: Ril Van Riel <[email protected]>
@RocheWilliam
Copy link

Just a question: How do we identify if a platform is an OpenBMC one where ipmitool should/could be used to report the AER information ?

@krishnanadh
Copy link
Contributor Author

There's no straightforward way to identify what BMC type is running underneath. Therefore added the configure option.

@RocheWilliam
Copy link

If we can't automatically identify what machines should use the ipmitool calls implemented here, shouldn't we have a runtime option to enable or disable this ipmitool use in addition to the configure option (generating a different binary) ? So that an identical binary can be used on OpenBMC platforms and non OpenBMC ?

@krishnanadh
Copy link
Contributor Author

krishnanadh commented Jul 17, 2023

Looked into adding a runtime option through a command line argument. The ipmitool call is running in the even handler and since we are using libtraceevent to register the ras_aser_event_handler, passing down a runtime argument will not be as clean. Any suggestions on how to go about it?

@RocheWilliam
Copy link

I would suggest to enrich the handler_ras_events() function passing more arguments information than just the "record_events" so that we could have an initialize function like ras_aer_handler_init(int force_ipmitool) that would update a static flag in the ras-aer-handler.c for example.
(I also had to create a mechanism for some Ampere machines to use ipmitool in case of AER errors -- you can give a look at this code which is also the pull request #56 so that a single arm binary can be used on these platforms or any other arm platforms without modification)

@krishnanadh
Copy link
Contributor Author

krishnanadh commented Jul 19, 2023

Thank you for the suggestions. Added a runtime argument -i and patched this PR.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@mchehab
Copy link
Owner

mchehab commented Oct 23, 2023

@mchehab, Can you please help review the PR?

Sorry, I've been busy those days. There are some issues on this PR:

  • The changes are not following the Linux Kernel coding style. We use it within rasdaemon. In particular, indents are done with tabs, and they mean 8 spaces;
  • Patches need to have your certificate of origin (Signed-off-by): See https://wiki.linuxfoundation.org/dco;
  • There are 3 patches with the same name/description but different contents. You should likely merge them, together with a typo fix;
  • with regards to this commit: d7d4c57efbd2 ("Only log correctable errors for unified SEL")
    why are you dropping uncorrectable logs? Shouldn't it be an option to enable/disable it at runtime?
    With such patch, rasdaemon generates a build warning:
    unified-sel.c:33:20: warning: ‘uncor_error_ids’ defined but not used [-Wunused-variable]
     33 | static const char *uncor_error_ids[32] = {
        |                    ^~~~~~~~~~~~~~~
    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants