Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf parsing opens the same files over and over again -> EMFILE #690

Open
GitMensch opened this issue Jan 11, 2025 · 2 comments
Open

perf parsing opens the same files over and over again -> EMFILE #690

GitMensch opened this issue Jan 11, 2025 · 2 comments
Labels

Comments

@GitMensch
Copy link
Contributor

Describe the bug
A perf record of a bunch of processes cannot be exported (directly from the command line to not open anything unnecessary).
As non-root hotspot seems to hang after a bunch of

failed to report elf for pid = 693568 : ElfInfo{localFile="/root/.debug/usr/lib64/libgmp.so.10.3.2/b7810a6ea7427180050fb6ab1364903d4f701c9d/elf", isFile=true, originalFileName="libgmp.so.10.3.2", originalPath="/usr/lib64/libgmp.so.10.3.2", addr=7f3a00684000, len=295000, pgoff=0, baseAddr=n/a} : Too many open files

As root those messages are seen over and over again.

To Reproduce
Do a system wide trace, doing something that involves a lot of processes.
run hotspot --exportTo out.perfparser perf.data

Expected behavior
Each file is only opened once; if this is not possible then each PID is handled separately (closing everything after the PID was handled; optional with a --save-but-slow option)

Screenshots
If applicable, add screenshots to help explain your problem.

Version Info (please complete the following information):

  • Linux Kernel version: 4.18.0-513.9.1.el8_9.x86_64
  • perf version: 4.18.0-513.18.1.el8_9.x86_64
  • hotspot version (appimage? selfcompiled?): hotspot 1.5.80 from appimage

Additional context
It seems that the same files are opened multiple times to resolve the symbols. I conclude that because the first PIDs that have libgmp loaded had no problem at all, but after a while l get this error message for each PID in the trace.

@GitMensch GitMensch added the bug label Jan 11, 2025
@GitMensch GitMensch changed the title perf parsing opens the same files ober and over again perf parsing opens the same files over and over again -> EMFILE Jan 11, 2025
@milianw
Copy link
Member

milianw commented Jan 12, 2025

this is an inherent limitation of elfutils, we must have a per-pid dwfl process, and each would separately process all encountered elfs. Meaning if you have lots of long lived processes that encounter a ton of elfs, then you may simply run out of file descriptors - I don't see a way to prevent that on our side.

and no, doing per-pid processing or nuking the dwfl's is not an option as that would be far too slow for situations where you have enough file descriptors.

the good news is that elfutils might get some new API for that in the future which would allow us to better reuse data across PIDs and thus drastically reduce the work required: https://sourceware.org/pipermail/elfutils-devel/2024q4/007674.html

@GitMensch
Copy link
Contributor Author

the good news is that elfutils might get some new API for that in the future which would allow us to better reuse data across PIDs and thus drastically reduce the work required

That RFC does sound promising - in general; especially as we bundle elfutils and therefore users would have access to this fast.

if you have lots of long lived processes that encounter a ton of elfs, then you may simply run out of file descriptors - I don't see a way to prevent that on our side.

and no, doing per-pid processing or nuking the dwfl's is not an option as that would be far too slow for situations where you have enough file descriptors.

I see the point but there can be another conclusion:

  • the current default should not be changed - because it works in our current scenario and will work much better if/when the RFC (which is currently still in its design, per last Dec 2024 notes) has made it to a working version in elfutils and perfparser was adjusted to make use of this and we use the appimage or a user has the most current elfutils available during build himself
  • because we know for sure that with current elfutils the current implementation will fail with file descriptors:
    • it would be good to stop processing when perfparser receives a threshold of errors (ideally scoped to ENOMEM/EMFILE -> non-recoverable), because then we get the same error over and over again for all further processing [and may be able to use the stuff already parsed as well]); I've terminated the process after possibly ~5-10 minutes of flooded error messages in the terminal -> should this be a separate FR (or even bug report)?
    • an optional --dwfl-per-pid / --minimal-memory option "free resources as fast as possible - very slow but can help with EMFILE/ENOMEM during parsing" would be good
    • the filtering per PID allow filtering to list of pid (additionally via executable name) / tid for --exportTo #524 has a big reason more (because filtering is better than "crashing")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants