-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in crun 1.16.1 causing occasional error "corrupted size vs. prev_size in fastbins" #1537
Comments
Looking at the change in 72b4eea if (process->user == NULL && container->container_def->process->user)
process->user = container->container_def->process->user; I wonder if this should be a deep-copy of If |
After tracing the execution a bit more (using print statements 😅) I think there is actually a double-free bug, not use-after-free like I originally thought.
If I add a print statement before and after the call to |
We are running some workloads using crun which started occasionally failing after upgrading to crun version
1.16.1
. In a significant fraction of these workloads, we started seeing the error "corrupted size vs. prev_size in fastbins." I'm not very familiar with this error, but some basic searching indicates that this might be happening due to an invalid memory access.Notably, the issue doesn't reproduce when running crun via podman. Our usage of
crun
is a bit unusual - we are not invoking it viapodman
and we also don't runconmon
, because we are trying to reduce overhead as much as possible. Instead, we generate an OCI container spec and then directly invokecrun
. We tried to make the generated container spec match podman's as closely as possible.To try and find the exact commit where this issue was introduced, I did a git bisect between 1.16 (a known good version) and 1.16.1 (the known bad version), building statically linked crun with the
nix
method outlined in the README. The bisect revealed that this behavior started happening in commit 72b4eea.Unfortunately, I don't have a minimal repro yet... it's difficult to find a minimal repro in this case because these are customer workloads where we don't have access to the source code. What I do know is that the executable for this workload appears to be nodejs 22. Unfortunately,
strace
proved unhelpful, because the issue doesn't reproduce understrace
. Even worse, if I wrap the workload with any wrapper process whatsoever (even just a simplesh -c 'exec <executable> <args...>
), the issue doesn't reproduce.I am continuing to investigate to see if I can find a minimal repro, and I am also planning to dig into that commit to see if I can find exactly what it changed that might be causing this bug, but I thought I would file a report anyway just to raise an early warning, and just so that other people can more easily find this issue if they start seeing this behavior too (it took me a long time to track down this commit as being the culprit).
The text was updated successfully, but these errors were encountered: