tests: kdoc: switch to python and improve algorithm #57

jacob-keller · 2025-10-22T21:54:21Z

I had a look at the kdoc test output for some patches recently, and found it
to be essentially unusable as a tool for quickly resolving new warnings. The
provided output is a diff (traditional, not even a unified context) format
which includes also warnings whose line number changed. This results in many
pre-existing warnings being reported. Spotting the actually new warnings is
quite difficult.

In addition, that original test can produce false positives. If a patch
introduces new warnings while removing others, the total number of
kernel-doc warnings might stay the same, thus allowing the test to pass
despite having new warnings.

A much better algorithm is to compare the warnings before and after without
the line numbers. I originally implemented such an algorithm in the shell
script via a complex set of awk, sed, and diff with the custom
'--*group-format' options. This worked, but is rather complex and somewhat
difficult to maintain.

Instead, this version re-implements the kdoc test entirely in python using
subprocess to execute the ./scripts/kernel-doc program. Line output is
converted into a custom KdocWarning dataclass using a regular expression to
extract the file and line paths. Comparisons are implemented only on the
file and content sections, ignoring line numbers.

Once captured, the results are converted into sets which we can perform set
difference on to compute the set of added and removed warnings. A quick
comparison on my 14 patch series shows that the new python implementation is
actually faster than the baseline original implementation:

Baseline time:
real 0m52.290s
user 0m45.392s
sys 0m44.627s

Shell implementation:
real 0m52.769s
user 0m45.504s
sys 0m45.365s

Python implementation:
real 0m49.179s
user 0m43.721s
sys 0m42.877s

The python version has several advantages over the shell version:

It is faster than the original implementation
The logic is more obvious and should be easier to maintain
Warning order from the script output is preserved completely
We don't have to collapse mutli-line warnings together

Here's an example of the output difference, with the current implementation shown first:

New warnings added
1c1
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:160 missing initial short description on line:
---
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:416 missing initial short description on line:
3,5c3,7
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:269 No description found for return value of 'ice_vc_parse_rss_cfg'
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:334 No description found for return value of 'ice_vf_adv_rss_offload_ena'
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:339 missing initial short description on line:
---
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:525 No description found for return value of 'ice_vc_parse_rss_cfg'
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:685 No description found for return value of 'ice_vf_adv_rss_offload_ena'
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:802 No description found for return value of 'ice_hash_remove'
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:1024 No description found for return value of 'ice_map_ip_ctx_idx'
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:1555 missing initial short description on line:
7,8c9,10
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:347 No description found for return value of 'ice_vc_handle_rss_cfg'
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:476 missing initial short description on line:
---
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:1563 No description found for return value of 'ice_vc_handle_rss_cfg'
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:1673 missing initial short description on line:
10,11c12,13
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:483 No description found for return value of 'ice_vc_config_rss_key'
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:523 missing initial short description on line:
---
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:1680 No description found for return value of 'ice_vc_config_rss_key'
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:1720 missing initial short description on line:
13,14c15,16
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:530 No description found for return value of 'ice_vc_config_rss_lut'
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:569 missing initial short description on line:
---
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:1727 No description found for return value of 'ice_vc_config_rss_lut'
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:1766 missing initial short description on line:
16,19c18,21
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:576 No description found for return value of 'ice_vc_config_rss_hfunc'
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:618 No description found for return value of 'ice_vc_get_rss_hashcfg'
< Warning: drivers/net/ethernet/intel/ice/virt/rss.c:657 No description found for return value of 'ice_vc_set_rss_hashcfg'
< Warning: include/linux/avf/virtchnl.h:1651 missing initial short description on line:
---
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:1773 No description found for return value of 'ice_vc_config_rss_hfunc'
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:1815 No description found for return value of 'ice_vc_get_rss_hashcfg'
> Warning: drivers/net/ethernet/intel/ice/virt/rss.c:1854 No description found for return value of 'ice_vc_set_rss_hashcfg'
> Warning: include/linux/avf/virtchnl.h:1701 missing initial short description on line:
21c23
< Warning: include/linux/avf/virtchnl.h:1662 No description found for return value of 'virtchnl_vc_validate_vf_msg'
---
> Warning: include/linux/avf/virtchnl.h:1712 No description found for return value of 'virtchnl_vc_validate_vf_msg'
Per-file breakdown

Now the new results:

New warnings added:
Warning: drivers/net/ethernet/intel/ice/virt/rss.c:802 No description found for return value of 'ice_hash_remove'
Warning: drivers/net/ethernet/intel/ice/virt/rss.c:1024 No description found for return value of 'ice_map_ip_ctx_idx'
Per-file breakdown:
     2 drivers/net/ethernet/intel/ice/virt/rss.c

This output is significantly more readable and useful for humans. It is also
more robust as it closes a possible loophole in patches adding and removing
warnings in the same change.

The existing kdoc test is written in shell, based on a simple heuristic of checking whether the number of lines of output from scripts/kernel-doc increases. This test can pass in some cases even when new warnings are introduced, such as if a single patch adds some new warnings while removing others. In addition, the provided log output is not very useful to humans. It uses a traditional diff to compare the prior and current warnings. Because the warnings include line number data, this results in an unreadable mess. Implementing a proper comparison in the shell script is possible, but the resulting logic and code is difficult to maintain. Ultimately, what we want to do is a sort of set intersection on the warnings while ignoring line numbers. Implementing this in python is much more practical. As a first step, convert the kdoc test into a python module. Commands are executed via subprocess. The improved comparison algorithm will be implemented in a following change. Since we will be implementing a new comparison algorithm, do not bother porting the "diff" implementation forward. This results in a temporary regression as we no longer compute the difference or file summary for the user. This will be resolved with the following change implementing the improved algorithm. This version has a couple of minor changes worth nothing: 1) The log output is now in stdout instead of stderr, since the python-based tests do not support logging output to the stderr file. 2) We no longer print out "Checking the tree..." lines. 3) The description line is included within the stdout log text as well as the desc file. Signed-off-by: Jacob Keller <[email protected]>

The current algorithm for the kdoc test is a simple line number comparison of the output from ./scripts/kernel-doc. This is not guaranteed to be accurate, as a patch could fix some warnings while introducing others, resulting in the total number of lines not changing. In addition, it is difficult to quickly determine which warnings are new. Historically, a simple "diff" was used to compare the before and after results. This was difficult to parse because line numbers change due to code being added or moved. To fix this, use a set difference algorithm which compares based on data from the warning lines without including the line number. To do this, introduce a KdocWarning dataclass, which represents a warning based on its kind, file, line, and content. Extract these from the lines via a regular expression that looks for the expected pattern of output from ./scripts/kernel-doc. Using a @DataClass allows specifying which fields to compare against. In particular, the line number is not counted. Additionally, the original message as output from ./scripts/kernel-doc is preserved as its own (non-compared) field. Some warnings are spread over two lines, indicated by the first line ending in a semicolon. Handle this when iterating over the output of kernel-doc and converting to the new warning objects. Any output line which doesn't match the regular expression is converted so that its entire message becomes the contents. This ensures that such lines still get counted and still get compared in some form, rather than silently ignored. After obtaining the converted output from ./scripts/kernel-doc, convert the current_warnings and incumbent_warnings lists into sets. Calculate the set of new and removed warnings by iterating the original lists. For new_warnings, find the sequence of warnings which are in the current_warnings but not in the incumbent set. Do the same to calculate the set of removed warnings. This implementation is fast, as it can use the hash-based comparisons for efficient set lookup. Using the original list also preserves the order of the warnings as output by ./scripts/kernel-doc. Calculating both the new and removed counts provides useful data for users of the NIPA tests. It is much easier to see at a glance what warnings were added. Signed-off-by: Jacob Keller <[email protected]>

jacob-keller · 2025-10-23T21:44:37Z

Tony pointed out a bug from his testing: the description strings were missing the f for format strings, so not being interpolated. I just fixed that.

kuba-moo · 2025-10-24T16:01:55Z

Thanks! Will deploy in a an hour or two.

kuba-moo · 2025-10-29T02:35:41Z

I fixed it a little bit already but it looks still broken eg: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

jacob-keller · 2025-10-29T16:24:36Z

I'll take a look today. It appears like either the output changed or some output I hadn't seen, so we need to extend the parser to handle the parse failed bits and some other data. Will see what I can figure out this afternoon.

jacob-keller · 2025-10-29T17:45:00Z

It looks like the logic you added to log bad parses was placed incorrectly, so almost all lines are warned about (except those which merge two lines together). I've got a fix for that. It also looks like kernel-doc is producing some warnings with just a line number, I haven't figure out why that happens yet. I'll push this PR to fix the logging and then continue investigating whats gone wrong with kernel-doc.

jacob-keller added 2 commits October 22, 2025 14:29

jacob-keller force-pushed the jk-improve-kdoc-output-python-parser branch from 3c77b76 to 1209a47 Compare October 23, 2025 21:43

kuba-moo merged commit 8fc078f into linux-netdev:main Oct 24, 2025
1 check passed

jacob-keller deleted the jk-improve-kdoc-output-python-parser branch October 24, 2025 21:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests: kdoc: switch to python and improve algorithm #57

tests: kdoc: switch to python and improve algorithm #57

Uh oh!

jacob-keller commented Oct 22, 2025 •

edited

Loading

Uh oh!

jacob-keller commented Oct 23, 2025

Uh oh!

Uh oh!

kuba-moo commented Oct 24, 2025

Uh oh!

kuba-moo commented Oct 29, 2025

Uh oh!

jacob-keller commented Oct 29, 2025

Uh oh!

jacob-keller commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tests: kdoc: switch to python and improve algorithm #57

tests: kdoc: switch to python and improve algorithm #57

Uh oh!

Conversation

jacob-keller commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacob-keller commented Oct 23, 2025

Uh oh!

Uh oh!

kuba-moo commented Oct 24, 2025

Uh oh!

kuba-moo commented Oct 29, 2025

Uh oh!

jacob-keller commented Oct 29, 2025

Uh oh!

jacob-keller commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jacob-keller commented Oct 22, 2025 •

edited

Loading