Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rasdaemon 0.6.6 version not logging the trace events from the kernel tracepoints #159

Open
prithivi17 opened this issue May 3, 2024 · 4 comments

Comments

@prithivi17
Copy link

As i have been working in rasdaemon lately. I was researching the whole flow of how rasdaemon works from the kernel space to the user space. Since im using debian 11.7 the rasdaemon version available in the os repo was 0.6.6 which seems to be broken. It doesn't captures the trace events of the hardware errors from the trace point of the kernel though the trace events are available in the kernel trace points. So i removed my repo version of rasdaemon and downloaded the 0.8.0 source and compiled the rasdaemon in my server. Now the rasdaemon works fine without any issue and i have found that 0.8.0 version is using libtraceevent to get the traces from the trace point where in 0.6.6 it uses its own libtrace headers. Now the part which i can't understand is when i uninstall my 0.8.0 version of rasdaemon and reinstall my old 0.6.6 repo version of rasdaemon it works!!!! in this case but when i reboot my server it again goes back to the state where it doesn't work. Can someone please explain this behaviour does it cache the 0.8.0 version of functionality in the memory or something like that and is there any fix for rasdaemon 0.6.6 not working as expected.

@prithivi17 prithivi17 changed the title Rasdaemon 0.6.2 not logging the trace events from the kernel tracepoints Rasdaemon 0.6.6 not logging the trace events from the kernel tracepoints May 3, 2024
@prithivi17 prithivi17 changed the title Rasdaemon 0.6.6 not logging the trace events from the kernel tracepoints Rasdaemon 0.6.6 (all the versions in debian repo) not logging the trace events from the kernel tracepoints May 3, 2024
@Sinzunza
Copy link

Hi, I'm having a similar issue. A few questions if you don't mind. I'm on Debian 12 and can't get Rasdaemon to report mce errors. Trying version Debian version 0.8.0-1 still doesn't work.

Any additional configuration you did to get Rasdaemon working? ... Tracing configuration? ... Linux Kernel configuration?
You mention all Debian versions don't work, is that including 0.8.0-1?

@tai271828
Copy link

  • This issue needs to be triaged before moving forward. It needs to be triaged as a real upstream issue, or a distribution-specific issue e.g. Debian.
  • This repository is the upstream source, and you are reporting an issue specific to a distribution.
  • If you are sure the issue is specific to Debian packaging, please report the issue to Debian package bug tracker for rasdaemon.
  • If you are not sure if the issue is an upstream issue or not, please try to nail down the issue. The usual first step is to tell people how you reproduce the issue, if possible.

There are newer debian package available. You may want to give it a try to see if you still reproduce the issue.

@prithivi17
Copy link
Author

prithivi17 commented Jun 19, 2024

Hi, actually the point here is the issue is not related to the debian repo version of rasdaemon . The point is rasdaemon 0.6.6 version available is not capturing the tracepoint events. Let me give you all the test that i have performed below,

md5sum 38404619a748b581529095a5a586e289 rasdaemon-0.6.6.zip --------> This is the source for rasdaemon-0.6.6 that i have downloaded from the github repository.

After i compiled the 0.6.6 version , i started the rasdaemon in foreground and record as below,
image

After that i initiated the edac-fake-inject error ,
image

But no error got captured in the 0.6.6 rasdaemon which i ran as the foreground before.

Now i removed the compiled version of 0.6.6 and installed 0.8.0 lastest version and tried the same,
image

Now you can see the mc error events are getting captured.

As per my analysis , I confirmed that errors are getting captured in the tracepoints in the kernel space, but 0.6.6 version of rasdaemon didn't capture the events from the tracepoints. as you can clearly see the in the below screenshot that no error is being recorded in the ras-mc-ctl table,
image

As I analyzed the commits for the changes, it seems like libraceevent is responsible for capturing the tracepoint event and helps rasdaemon to capture the events. As this is included in the binary of the rasdaemon 0.6.6 source but in rasdaemon 0.8.0 , the code has been changed to use the kernel so file libtracevent.so for capturing the trace events from the tracepoint.

As a workaround, I upgraded the 0.6.6 version to the latest version available in the github repository (i.e.,) rasdaemon 0.8.0. Now the rasdaemon is working fine. Need to know if there is a way to fix this rasdaemon 0.6.6 version to capture the tracepoint. You can update me with the fix if possible, it would be very helpful.

Thanks in advance.

@prithivi17 prithivi17 reopened this Jun 19, 2024
@prithivi17 prithivi17 changed the title Rasdaemon 0.6.6 (all the versions in debian repo) not logging the trace events from the kernel tracepoints Rasdaemon 0.6.6 version not logging the trace events from the kernel tracepoints Jun 19, 2024
@prithivi17
Copy link
Author

Hi, I'm having a similar issue. A few questions if you don't mind. I'm on Debian 12 and can't get Rasdaemon to report mce errors. Trying version Debian version 0.8.0-1 still doesn't work.

Any additional configuration you did to get Rasdaemon working? ... Tracing configuration? ... Linux Kernel configuration? You mention all Debian versions don't work, is that including 0.8.0-1?

Get the rasdaemon latest src from the github (i.e.,) 0.8.0 version and follow the compilation steps i mention below it will work,

git clone https://github.com/mchehab/rasdaemon.git
rm -r /var/lib/rasdaemon/ras-mc_event.db
apt-get install make gcc autoconf automake libtool libevent-dev tar libsqlite3-dev libdbd-sqlite3-perl libtraceevent-dev pkg-config - (necassary packages for compilation)

autoreconf -vfi
./configure --enable-all --localstatedir=/var
make
make install

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants