-
Notifications
You must be signed in to change notification settings - Fork 6
DLPX-94248 address drgn merge conflict with upstream #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
DLPX-94248 address drgn merge conflict with upstream #72
Conversation
Ubuntu enables -Wformat-security by default, but upstream GCC doesn't enable it even with -Wall. It caught some legitimate issues in the module API branch, so let's enable it explicitly. Signed-off-by: Omar Sandoval <[email protected]>
An upcoming commit needs to look up a few notes by name+type, so add a common helper function for that. Signed-off-by: Omar Sandoval <[email protected]>
This will be used for checking the CRC in .gnu_debuglink. Signed-off-by: Omar Sandoval <[email protected]>
These will be used for GNU build IDs. Signed-off-by: Omar Sandoval <[email protected]>
We are going to need to parse various structures that have different 64- and 32-bit formats, may be byte-swapped, and may be unaligned (e.g., various ELF and link map structures). Provide a couple of convenience macros for doing this. Signed-off-by: Omar Sandoval <[email protected]>
This is a trivial wrapper to delete by entry instead of by key or iterator. Signed-off-by: Omar Sandoval <[email protected]>
This has been available since Linux kernel commit 0935288c6e00 ("kdump: append kernel build-id string to VMCOREINFO") (in v5.9). Save it so we can use it when loading debugging information. Unfortunately, the build ID in VMCOREINFO was briefly broken in several stable releases. 6.10 and 5.15 reached their end-of-life while broken and so will remain broken forever. It feels like overkill to drop support for those versions over this, so we work around it with a version check. Signed-off-by: Omar Sandoval <[email protected]>
No code changes other than moving it in the file to make upcoming diffs cleaner. Signed-off-by: Omar Sandoval <[email protected]>
We currently use debuginfod via libdwfl, but when we get rid of our libdwfl dependency, we'll need to do the debuginfod calls ourselves. So, let's add the scaffolding for using libdebuginfod. We provide three choices at build time: * No debuginfod (./configure --without-debuginfod). * Soft dependency: load libdebuginfod with dlopen at runtime if available (./configure --with-debuginfod --enable-dlopen-debuginfod). This is the default and probably what we want distros to use. * Hard dependency: link against libdebuginfod (./configure --with-debuginfod --disable-dlopen-debuginfod). This is intended for environments where dlopen can't be used (e.g., manylinux wheels). The client handle will be created lazily, so for now this just sets up some wrappers and doesn't do much with them. Signed-off-by: Omar Sandoval <[email protected]>
We try to pick a good default, but it's not exactly the same as logging so it needs extra flexibility. This will be used by upcoming debuginfod integration. Signed-off-by: Omar Sandoval <[email protected]>
drgn currently provides limited control over how debugging information is found. drgn has hardcoded logic for where to search for debugging information. The most the user can do is provide a list of files for drgn to try in addition to the default locations (with the -s CLI option or the drgn.Program.load_debug_info() method). The implementation is also a mess. We use libdwfl, but its data model is slightly different from what we want, so we have to work around it or reimplement its functionality in several places: see commits e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn: debug_info: report userspace core dump debug info ourselves"), and 1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for some examples. The mismatched combination of libdwfl and our own code is difficult to maintain, and the lack of control over the whole debug info pipeline has made it difficult to fix several longstanding issues. The solution is a major rework removing our libdwfl dependency and replacing it with our own model. This (huge) commit is that rework comprising the following components: - drgn.Module/struct drgn_module, a representation of a binary used by a program. - Automatic discovery of the modules loaded in a program. - Interfaces for manually creating and overriding modules. - Automatic discovery of debugging information from the standard locations and debuginfod. - Interfaces for custom debug info finders and for manually overriding debugging information. - Tons of test cases. A lot of care was taken to make these interfaces extremely flexible yet cohesive. The existing interfaces are also reimplemented on top of the new functionality to maintain backwards compatibility, with one exception: drgn.Program.load_debug_info()/-s would previously accept files that it didn't find loaded in the program. This turned out to be a big footgun for users, so now this must be done explicitly (with drgn.ExtraModule/--extra-symbols). The API and implementation both owe a lot to libdwfl: - The concepts of modules, module address ranges/section addresses, and file biases are heavily inspired by the libdwfl interfaces. - Ideas for determining modules in userspace processes and core dumps were taken from libdwfl. - Our implementation of ELF symbol table address lookups is based on dwfl_module_addrinfo(). drgn has taken these concepts and fine-tuned them based on lessons learned. Credit is also due to Stephen Brennan for early testing and feedback. Closes #16, closes #25, closes osandov#332. Signed-off-by: Omar Sandoval <[email protected]>
Overwriting a file that libelf has already mmap'd can confuse it and cause it to crash. In particular, libelf/elf_begin.c::file_read_elf() initializes Elf_Scn::rawdata_base and Elf_Scn::data_base from the mmap'd file. libelf/elf_getdata.c::__libelf_set_rawdata_wrlock() also sets Elf_Scn::rawdata_base from the mmap'd file. If the file changes between those two events, then Elf_Scn::rawdata_base will change. Then, the following line in libelf/elf_end.c::elf_end() will try to free an mmap'd pointer: if (scn->data_base != scn->rawdata_base) free (scn->data_base); Stephen reported crashes like this from test_gnu_debugaltlink_debug_directories() while testing a patch that inadvertently caused debug info to be indexed on module creation. Reported-by: Stephen Brennan <[email protected]> Signed-off-by: Omar Sandoval <[email protected]>
Since commit 4e83130 ("Introduce module and debug info finder APIs"), DWARF indexing is somewhat lazy. As a result, drgn_void_type() may require DWARF indexing in order to determine the program's main language. This is overkill for drgn_object_init(), which just needs to initialize a valid dummy object that is usually reinitialized immediately. Signed-off-by: Omar Sandoval <[email protected]>
Fixes: 4e83130 ("Introduce module and debug info finder APIs") Signed-off-by: Omar Sandoval <[email protected]>
Mention that `drgn` is shipped with Ubuntu except jammy, but it's an older version Signed-off-by: Michel Lind <[email protected]>
Since libkdumpfile commit 5b044292abe9 ("Clarify and fix attribute data lifetime") changes the lifetime of attribute values retrieved with kdump_attr_ref_get(), the extra reference would keep the PRSTATUS blob around even after kdump_free(). However, the attribute hierarchy cannot change while iterating over the PRSTATUS attributes, so it is not necessary to take an attribute reference and we can use kdump_get_typed_attr(). The attribute blob itself should not change either, but it is a good idea to keep its data pinned, because a raw pointer to it is stored in the drgn_thread_set hash table. If some code tries to modify the PRSTATUS attribute data, the attempt will fail with KDUMP_ERR_BUSY rather than leave a dangling pointer in the hash table and possibly cause a UAF bug later. The blob pin does not prevent freeing the blob when the blob reference count reaches zero. Signed-off-by: Petr Tesarik <[email protected]>
The kdump_get_typed_attr() function prototype changed in libkdumpfile commit e182aeaf4d72 ("Make kdump_get_typed_attr() easier to use"). Signed-off-by: Petr Tesarik <[email protected]>
…on_name Multiple people have lamented that StackFrame.name is None for functions implemented in assembly or missing debug info for any other reason. With DWARFless debugging, this will be way more common. My original hope was that StackFrame.name would strictly be the function name from the debugging information and that callers would fall back to getting the symbol name themselves. However, the distinction isn't super meaningful to users, so let's add the fallback directly to StackFrame.name and add StackFrame.function_name with the old behavior of StackFrame.name. Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
This was previously used for testing internals via ctypes, but it's no longer needed. Fixes: 7d251fe ("Translate C lexer tests to C unit tests") Signed-off-by: Omar Sandoval <[email protected]>
The x86-64 fallback unwinder currently has a special case for handling a call to a NULL pointer. Other architectures need the same workaround. To avoid code duplication, let's extract the null program counter check into the generic stack tracing code and add a bad_call_unwind architecture callback. This also gives us a centralized place to add heuristics for detecting non-null bad calls. Signed-off-by: Omar Sandoval <[email protected]>
This will mainly be useful for bad_unwind_call implementations that make use of drgn_register_state_dup() and need to mark some registers as unknown in the unwound frame. Signed-off-by: Omar Sandoval <[email protected]>
Since AArch64 uses a link register rather than storing the return address on the stack, this is a bit easier than on x86-64. Fixes osandov#462. Signed-off-by: Omar Sandoval <[email protected]>
This is a shorter-term solution for anyone who can't run a version of drgn with the previous fix. Signed-off-by: Omar Sandoval <[email protected]>
PID 0 is not unique in the Linux kernel; there is a task with PID 0 for each CPU. stack_trace(0) currently fails with a generic "task not found" error message, which can be confusing; see osandov#462. Add a hint to use idle_task() to the error message when the given PID is 0. Signed-off-by: Omar Sandoval <[email protected]>
Alec Rivers noticed in osandov#461 that WITH_LIBKDUMPFILE is misspelled as WITH_KDUMPFILE here. The whole ifdef block isn't actually needed, so remove it. Fixes: 4e330bb ("cli: indicate if drgn was compiled with libkdumpfile") Signed-off-by: Omar Sandoval <[email protected]>
None of our setter functions handle deletions, which pass the value as NULL. This results in a segfault when attempting to access the value. Fix them all with a new convenience macro. Fixes: 50e4ac8 ("libdrgn: allow overriding program default language") Fixes: 4e83130 ("Introduce module and debug info finder APIs") Signed-off-by: Omar Sandoval <[email protected]>
STRING_BUILDER has been really convenient, so let's do the same for vectors and hash tables. Signed-off-by: Omar Sandoval <[email protected]>
…lable Since the module API was introduced, Program.load_debug_info() and the drgn CLI's -s option match strictly based on build IDs. This fails when the build ID is not available, specifically in the case of Linux kernel core dumps without a usable build ID in VMCOREINFO (old versions and a few buggy stable versions). Before the module API, Program.load_debug_info() and -s used any vmlinux file given to them. This caused confusion when the wrong file was given, so we don't want to bring that behavior back. Instead, let's look for a vmlinux file matching the Linux version from VMCOREINFO. Fixes osandov#464. Fixes: 4e83130 ("Introduce module and debug info finder APIs") Signed-off-by: Omar Sandoval <[email protected]>
We do this so often I'm surprised I didn't add this sooner. Signed-off-by: Omar Sandoval <[email protected]>
The new version includes a new kdumpid binary with a dependency on BFD that we don't want, so disable it. Signed-off-by: Omar Sandoval <[email protected]>
Some versions of the kernel fail to build for ppc64 with GCC 12 and CONFIG_KPROBES enabled due to -Wdangling-pointer. Backport the commit that disabled that warning. Fixes: 9b7297d ("vmtest.config: enable CONFIG_KPROBES for upcoming kmodify breakpoints") Signed-off-by: Omar Sandoval <[email protected]>
The GitHub Actions Ubuntu 20.04 image has just been shut down: actions/runner-images#11101. Update to Ubuntu 22.04. Since 22.04 doesn't include Python 3.6 or 3.7, also drop those. I've already announced that this upcoming release will be the last one with support for those Python versions (osandov#467). I'll test them manually before cutting the release. Signed-off-by: Omar Sandoval <[email protected]>
I've found that these flood the logs and often make me lose more interesting stuff in the scrollback. The values are available in Module.section_addresses, so just log where we got them. Signed-off-by: Omar Sandoval <[email protected]>
…lugins These options also disable directories added by plugins, so document that. Signed-off-by: Omar Sandoval <[email protected]>
dnf_debug_info_finder() needs a couple more comments, and example_debug_info_finder() wouldn't actually run as is. Signed-off-by: Omar Sandoval <[email protected]>
It's missing a blank line, which messes up the formatting. Also reword it a tiny bit. Fixes: 545aa52 ("mm, slab: Fix test failures on kernels with SLOB") Signed-off-by: Omar Sandoval <[email protected]>
The major reworks for the module/debug info finder APIs and proper debuginfod support substantially change the story for getting debugging symbols. Revamp our documentation: * Document how to use debuginfod on different distributions. * Add flow charts with our recommendations on each distribution. * Add openSUSE documentation. * Also document how to build with debugging symbols on different build systems. Closes osandov#380. Signed-off-by: Omar Sandoval <[email protected]>
Fixes: adf6472 ("Add plugin system") Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
struct foo64 is only 20 bytes with i386's alignment requirements, so we get the following compiler warning on i386: ‘memcpy’ forming offset [20, 23] is out of the bounds [0, 20] of object ‘foo’ with type ‘struct foo64’ and a crash when running the tests. Fix it by adding padding manually. I verified that this only affects the test cases. All of the structs we actually use deserialize_struct64 for are properly defined. Fixes osandov#493. Signed-off-by: Omar Sandoval <[email protected]>
We need to catch and print exceptions in the forked process. stdout and stderr are also str, so we shouldn't be calling .decode(). Fixes: 22ac3f3 ("tests: don't shell out for CLI tests") Signed-off-by: Omar Sandoval <[email protected]>
drgn -e '' should start drgn, execute the empty statement, and exit, not enter interactive mode. Fix it by checking whether args.exec is None instead of bool(args.exec). Fixes: 150ee76 ("cli: add -e option to exec() code directly") Signed-off-by: Omar Sandoval <[email protected]>
We don't have a nice automated way to run the C unit tests from the main vmtest CLI yet (I'm probably going to wait until we convert to Meson to see how that will work), but at least install check so it can be done manually. Signed-off-by: Omar Sandoval <[email protected]>
From commit 9f6c61f96f2d9 ("proc/mounts: add cursor") in v5.8, until commit 2eea9ce4310d8 ("mounts: keep list of mounts in an rbtree") in v6.8, readers of the /proc/mounts file inserted a fake cursor object into the mount list, to keep their place. These cursors had the flag MNT_CURSOR set, which was unfortunately a pre-processor constant, and since its removal in 6.8, its value has been repurposed. The "struct mount" object used as a cursor is entirely zero'd, except for the list_head and the flag field. So an easy way to detect cursors is simply to check that "mnt.mnt_sb" is NULL, which should never happen otherwise. Introduce a test for this behavior as well. Without the fix, the test indeed fails between v5.8 and v6.8, but passes on other kernels. Signed-off-by: Stephen Brennan <[email protected]>
Add a helper that detects if a page is a page_pool page. Signed-off-by: Dragos Tatulea <[email protected]>
drgn can help a lot with debugging page_pool issue. So much so that I made a talk at netdevconf 0x19 about it. This commit adds the scripts to the contrib directory with usage instructions in README.rst. Signed-off-by: Dragos Tatulea <[email protected]>
Python "LazyObjects" (e.g. TypeMember) contain a callable or an object, either of which may contain a reference to the Program. When cached on the program, these references form a cycle. Add tp_traverse methods to help detect these cycles and avoid resource leakages. Fixes osandov#497. Signed-off-by: Stephen Brennan <[email protected]>
We should be checking if self->trace is NULL, not if container_of(self->trace->prog, Program, prog) is NULL. We don't create StackTrace objects with a NULL self->trace anyways, so this is mostly defensive. Fixes: 46343ae ("libdrgn: get rid of struct drgn_stack_frame") Signed-off-by: Omar Sandoval <[email protected]>
As Stephen Brennan discovered in osandov#498, we're missing Py_TPFLAGS_HAVE_GC/tp_traverse in a bunch of places. I misunderstood tp_clear not being needed for immutable types to mean that tp_traverse isn't needed, either, but that's wrong. We're also missing PyObject_GC_UnTrack() calls in the tp_dealloc implementations for garbage collected types. Add the missing implementations/calls. I tested this with a script that runs tests and compares the set of live programs (which we save for our logging hack): https://gist.github.com/osandov/b4cdab8e61476dd5cfa306dab5d4c7cc. A few test classes needed a tearDownClass() method to clear the program, but after doing that, there are no leaks. (There appears to be a false positive for our expectedFailure() case, but that should hopefully be gone soon.) Benchmarks also showed no significant change in performance from the added calls. Signed-off-by: Omar Sandoval <[email protected]>
27c9f93
to
441b103
Compare
if (i >= char_p_vector_size(&it.entry->value)) { | ||
// No matches remaining for this module. Clear the old | ||
// matches and find another one. | ||
vector_for_each(char_p_vector, path, &it.entry->value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mend Code Security Check
New finding (6 of 10)
The Mend Code Security Check of your branch failed because of a Use After Free finding in this line.
Severity | Vulnerability Type | CWE | File | Data Flows | Detected | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Use After Free | 1 | 2025-05-14 07:58pm | |||||||||||||||
|
size_t i = 0; | |
for (;;) { | |
if (i >= char_p_vector_size(&it.entry->value)) { | |
// No matches remaining for this module. Clear the old | |
// matches and find another one. | |
vector_for_each(char_p_vector, path, &it.entry->value) |
Secure Code Warrior Training Material
● Training
▪ Secure Code Warrior Use After Free Training
● Videos
🏴 Suppress Finding
- ... as False Alarm
- ... as Acceptable Risk
for (struct drgn_dedupe_type_set_iterator it = | ||
drgn_dedupe_type_set_first(&prog->dedupe_types); | ||
it.entry; it = drgn_dedupe_type_set_next(it)) | ||
hash_table_for_each(drgn_dedupe_type_set, it, &prog->dedupe_types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mend Code Security Check
New finding (7 of 10)
The Mend Code Security Check of your branch failed because of a Use After Free finding in this line.
Severity | Vulnerability Type | CWE | File | Data Flows | Detected | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Use After Free | 1 | 2025-05-14 07:58pm | |||||||||||||||||
|
} | |
free(type); | |
} | |
drgn_typep_vector_deinit(&prog->created_types); | |
hash_table_for_each(drgn_dedupe_type_set, it, &prog->dedupe_types) |
1 Data Flow/s detected
Line 1356 in 441b103
hash_table_for_each(drgn_dedupe_type_set, it, &prog->dedupe_types) |
Secure Code Warrior Training Material
● Training
▪ Secure Code Warrior Use After Free Training
● Videos
🏴 Suppress Finding
- ... as False Alarm
- ... as Acceptable Risk
|
||
static void drgn_module_free_section_addresses(struct drgn_module *module) | ||
{ | ||
hash_table_for_each(drgn_module_section_address_map, it, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mend Code Security Check
New finding (9 of 10)
The Mend Code Security Check of your branch failed because of a Use After Free finding in this line.
Severity | Vulnerability Type | CWE | File | Data Flows | Detected | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Use After Free | 1 | 2025-05-14 07:58pm | |||||||||||||||
|
drgn_module_address_key, | |
binary_search_tree_scalar_cmp, splay); | |
static void drgn_module_free_section_addresses(struct drgn_module *module) | |
{ | |
hash_table_for_each(drgn_module_section_address_map, it, |
Secure Code Warrior Training Material
● Training
▪ Secure Code Warrior Use After Free Training
● Videos
🏴 Suppress Finding
- ... as False Alarm
- ... as Acceptable Risk
Merge conflict due to us having removed the
pre-commit-config.yaml
file downstream. We should likely re-instate the file, to avoid conflicts like this moving forward. I'll look into that solution in a future change.