Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UEK-NEXT support #127

Merged
merged 19 commits into from
Nov 20, 2024
Merged

UEK-NEXT support #127

merged 19 commits into from
Nov 20, 2024

Conversation

brenns10
Copy link
Member

@brenns10 brenns10 commented Nov 14, 2024

This pull request begins running the Github CI tests against UEK-next, which is currently at 6.11. It includes a whole crop of fixes to helpers that allow the tests to pass. This is the first version of UEK-next that I'm actually doing this for, so it includes a lot of changes from UEK7 era up till the present.

These fixes are are necessary, but almost certainly incomplete. The Github CI tests are very small QEMU VMs, and I did manually verify that the tests pass against my laptop as well. There are plenty of subsystems not exercised by my laptop or the limited VM in the CI tests.

I'm breaking these changes down into a few categories for which I believe subject matter experts should review. If you're tagged here, please take a look at the subset of changes mentioned (note the Git SHAs may not be correct, as I may need to amend things to add bug references).

# SMP, workqueue: @imran-kn 
648a271 smp: skip test on live systems
8ded078 workqueue: handle changes in 6.9

# Virtual Filesystem / mounts: @gtmoth 
2d3ffc8 mounts: include superblock sub-type
0cf7aeb fsnotify: handle changes in 6.10
7bf4632 dentry: mark dentry_path_any_mount for removal

# Block IO: @biger410
32d3d64 partition: add tests for partition info
faa81ff partition: use block fixes for 6.10
9c5f640 block: fixes for v6.10
0f893d0 nvme: fixes for Linux 6.11

# Memory management: @jianfenw 
ef72437 numastat: Add missing stats from newer kernels
2e5bc1f task: support computing rss_stat for v6.2+

# Internals & tiny, obviously correct changes: @brenns10 
99bf206 bt: eliminate TODO for DWARF opcode exceptions
ea84304 module: remove compatibility with drgn < 0.0.21
a78f4b0 testing: litevm: enable testing on uek-next
2bb9b08 net: use task_cpu() helper

# Scheduling & generic kernel: @imran-kn or @biger410 
c86272c Fix for_each_task_in_group() for Linux 6.7+
18081d1 tests: task: fix failure on v6.9
c0dd25e module: support v6.4+ & efficient address lookup

# Generic driver APIs: @biger410 as a reviewer of last resort
afdf8d4 device: add helpers to get subsys_private

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Nov 14, 2024
@brenns10 brenns10 force-pushed the fix-ueknext-tests branch 3 times, most recently from 7169419 to 648a271 Compare November 18, 2024 22:40
Kernel 6.4 adopted a new module memory layout structure which is more
flexible than before. Rather than complicate the "ModuleLayout"
structure which contained lots of details from the previous
implementation, I've gone ahead and removed as much detail as possible.
Now the module helpers simply return a list of module address ranges,
which may be text, data, rodata, or anything really.

In addition, to more efficiently look up module addresses, add support
for the kernel module address tree. This avoids iterating over
the (possibly long) module list each time you want to find out which
module an address belongs to.

These helpers ended up being really nice, and I upstreamed them to drgn.
Each helper function which was upstreamed has a TODO entry so that we
can find and remove them when we raise our minimum required version to
drgn 0.0.28.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
This also somewhat simplifies the legacy code thanks to a new helper for
iterating over all tasks in a task group.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
I'm not sure the root cause, but some workers now have NULL task fields.
I'm confident this has always been legal, but just didn't happen in
tests prior to v6.9 due to some kernel internals. Now that it happens,
let's fix this case by searching for a worker with a non-NULL task.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
Also, "NR_UNSTABLE_NFS" was never being set non-zero because we were
looking it up in mm_stats, not node_zone_stats. Fix that too.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
In 6.3 and 6.4 there was a push to make struct bus_type and struct class
const. This means that the private pointers were removed, and replaced
by accessor functions. Implement a drgn version of each accessor.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
Previous commits have resolved all of the compatibility issues as of
UEK-next 6.9.0-2. Enable the tests so we can run them in CI.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
The minimum required drgn version is 0.0.25 for drgn-tools.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
These errors are not fully resolved in any drgn version, and they're not
fatal either. Leave the documentation links, but take away the TODO
since there's nothing here to fix.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
An upstreamed version is available within the standard d_path()
function, starting from 0.0.29. Once this is the minimum required
version of drgn, we can delete this function.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
In v6.10 there was a cleanup of struct block_device, removing the
bd_inode field and combining bd_read_only, bd_partno, and others into a
single __bd_flags atomic field. Fix the helpers for these changes.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
Computing size and read-only is duplicated here. Instead, use the
helpers from drgn_tools.block, which are already fixed for v6.10 and
later.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
The information for partition info can be easily gleaned from
/sys/class/block on live systems. Add unit tests to verify the
information is correct, so that our test can detect block-related
changes like the ones corrected recently.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
This matches the output of /proc/mounts, which is important because the
test cases compare the output from this helper against /proc/mounts.
With this, tests pass on my laptop which happens to have some FUSE
mounts that populate s_subtype.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
The max_active field got moved into the workqueue_struct.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
This should have been done a while ago. The "smp" corelens module
already disallows running on live systems. IPIs fly too fast for us to
keep up in userspace, so running the full module tends to cause issues.
In this case, the specific issue had to do with unwinding the stack on
the running task, which raises an error in drgn.

Orabug: 37296325
Signed-off-by: Stephen Brennan <[email protected]>
@brenns10 brenns10 marked this pull request as ready for review November 18, 2024 23:50
Copy link
Member

@biger410 biger410 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IO fixes looks good to me

Copy link
Contributor

@imran-kn imran-kn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes, in following files, LGTM:

  1. test_mm.py
  2. test_module.py
  3. test_smp.py
  4. test_task.py
  5. module.py
  6. task.py
  7. workqueue.py
  8. numastat.py

Copy link
Contributor

@imran-kn imran-kn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in lsmod.py are good to go.

Copy link
Member

@jianfenw jianfenw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for changes in numstat.py and task.py.
Looks like the upstream has made quite some changes in mm. Thanks for fixing these!

@brenns10 brenns10 merged commit ac89696 into oracle-samples:main Nov 20, 2024
5 checks passed
@brenns10 brenns10 deleted the fix-ueknext-tests branch November 20, 2024 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants