Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rasdaemon not logging #84

Open
DonKatsu opened this issue Jan 31, 2023 · 9 comments
Open

rasdaemon not logging #84

DonKatsu opened this issue Jan 31, 2023 · 9 comments

Comments

@DonKatsu
Copy link

DonKatsu commented Jan 31, 2023

Distro: Fedora 37 KDE
Kernel: 6.1.8
rasdaemon version: 0.6.8
CPU: Ryzen 9 5900x

Due to the erroneous reporting of disk errors by rasdaemon bloating my log, I deleted the files ras-mc_event.db and ras-mc_event.db-journal in /var/lib/rasdaemon. After restarting the rasdaemon service clean ones were created.
Since then I had noticed it stopped logging those false disk errors. Then eventually I got another MCE error, and noticed that one wasn't logged either. (Not sure if doing that was directly related, but the timing lined up.) I reinstalled rasdaemon and waited for another one to happen to be sure.
This latest one wasn't logged either, and those supposed disk errors still aren't as well even though the service still seems to be reporting them.
Screenshot.
Journal log of a systemctl restart rasdaemon. Those core dumps happen on a fresh boot as well.

I have uninstalled mcelog, and I don't have the ras-mc-ctl service enabled since it fails and exits due to my system not having ECC memory.

@mchehab
Copy link
Owner

mchehab commented Feb 18, 2023

There is a known regression with Kernel 6.1. The fix depends on both adding a patch to the Linux Kernel and a change in rasdaemon. See: 6986d81

The Kernel patch was already merged and backported to Kernel 6.1.12: https://lwn.net/Articles/923307/.

I merged today the rasdaemon patch and released version 0.8.0, but Fedora packages don't contain the regression fix yet.

I'm planning to cherry-pick the fix and apply for Fedora 36 and 37 later today.

Anyway if you want to check, you can either wait for 6.1.12 or download it from koji, and build rasdaemon from the sources using make mock, and then install the package from the SPRMS/ directory.

@mchehab
Copy link
Owner

mchehab commented Feb 18, 2023

I added a Fedora 37 package, based on version 0.6.8: https://bodhi.fedoraproject.org/updates/FEDORA-2023-e1ccb95257. Yet, I'd appreciate feedback on version 0.8.0 as well, as it is now using libtraceevent.

@DonKatsu
Copy link
Author

I now have both kernel 6.1.12, and rasdaemon 0.6.8 which hit Fedora's stable repo last night.

rasdaemon 0.6.8 hasn't segfaulted as expected. But now it gives a SELinux denial for attempting to access dac_override when it's started. Still, the rasdaemon processes are alive and the service is active (running).

After getting rasdaemon 0.6.8 and checking ras-mc-ctl --errors I saw these reported disk errors. I hadn't checked it since making this issue, so I have to assume they were made when stated. The last modified dates for ras-mc_event.db and ras-mc_event.db-journal are the 23rd and 24th respectively. An hour before the 8th's entries, I had upgraded to kernel 6.1.10 and likely immediately restarted.

@zpytela
Copy link

zpytela commented Apr 28, 2023

Hello,

The dac_override capability is requested on an access attempt where DAC permission do not allow this access and usually indicate a problem with the permissions. Please use strace to locate the files or turn on full auditing to gather more information.

1) Open the /etc/audit/rules.d/audit.rules file in an editor.
2) Remove the following line if it exists:
-a task,never
3) Add the following line to the end of the file:
-w /etc/shadow -p w
4) Restart the audit daemon:
  # service auditd restart
5) Re-run your scenario.
6) Collect AVC denials:
  # ausearch -i -m avc,user_avc,selinux_err,user_selinux_err -ts today

@DonKatsu
Copy link
Author

DonKatsu commented May 2, 2023

I finally had another MCE event while still on Fedora 37 with rasdaemon 0.6.8.
Immediately after the kernel notified of the MCE error, rasdaemon immediately crashed and restarted 5 times before finally settling down and throwing its selinux denial. (Though apparently there were ones for each crash.)
It did not log the MCE error when checking with ras-mc-ctl --errors.
Here's the journal from the event. Gist
And for some reason, it's saying rasdaemon: Old kernel detected. Stop listening and fall back to pthread way. despite being on kernel 6.2.11 there? It still says that on Fedora 38 with kernel 6.2.13.

I've now updated to Fedora 38, and have rasdaemon 0.8.0.

@zpytela This is what I get after following that and restarting rasdaemon 0.8.0:

ausearch -i -m avc,user_avc,selinux_err,user_selinux_err -ts today
----
type=AVC msg=audit(05/02/23 13:11:25.905:1301) : avc:  denied  { dac_override } for  pid=543881 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 
----
type=AVC msg=audit(05/02/23 13:11:26.409:1315) : avc:  denied  { dac_override } for  pid=543931 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 
----
type=AVC msg=audit(05/02/23 13:11:26.896:1329) : avc:  denied  { dac_override } for  pid=543984 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 
----
type=AVC msg=audit(05/02/23 13:11:27.405:1343) : avc:  denied  { dac_override } for  pid=544029 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 
----
type=AVC msg=audit(05/02/23 13:11:27.911:1359) : avc:  denied  { dac_override } for  pid=544087 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 
----
type=AVC msg=audit(05/02/23 17:23:30.819:111) : avc:  denied  { dac_override } for  pid=3215 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 
----
type=PROCTITLE msg=audit(05/02/23 17:43:14.738:305) : proctitle=/usr/sbin/rasdaemon -f -r 
type=PATH msg=audit(05/02/23 17:43:14.738:305) : item=0 name=/sys/kernel/debug/tracing/instances/rasdaemon/buffer_percent inode=56828 dev=00:0c mode=file,440 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:tracefs_t:s0 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0 
type=CWD msg=audit(05/02/23 17:43:14.738:305) : cwd=/ 
type=SYSCALL msg=audit(05/02/23 17:43:14.738:305) : arch=x86_64 syscall=openat success=no exit=EACCES(Permission denied) a0=AT_FDCWD a1=0x7ffee456a810 a2=O_WRONLY a3=0x0 items=1 ppid=1 pid=14786 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rasdaemon exe=/usr/sbin/rasdaemon subj=system_u:system_r:rasdaemon_t:s0 key=(null) 
type=AVC msg=audit(05/02/23 17:43:14.738:305) : avc:  denied  { dac_override } for  pid=14786 comm=rasdaemon capability=dac_override  scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0 

@zpytela
Copy link

zpytela commented May 3, 2023

Thank you, I can confirm that. I've created a kernel bz to make the file read-write.
https://bugzilla.redhat.com/show_bug.cgi?id=2192910

@DonKatsu
Copy link
Author

Since that kernel change was implemented, I've no longer seen any rasdaemon related selinux denials.

I am still getting repeated crashes from rasdaemon 0.8.0 however. This is from the start of my most recent session.
journal_snip.txt
coredumpctl_gdb_rasdaemon.txt

@zpytela
Copy link

zpytela commented Jun 23, 2023

@DonKatsu The service was starting on my vm without errors, so please file a new bz on the ras component.

@DonKatsu
Copy link
Author

Sorry, I didn't mean to imply the crashing was to do with selinux.

Had an MCE event today after nothing for two months.
Didn't get picked up by rasdaemon again, ras-mc-ctl --errors still shows No MCE errors. rasdaemon had crashed at the same time the corrected error was reported by the kernel.
log.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants