Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classify cdx component type file #141

Merged
merged 1 commit into from
Jan 15, 2025
Merged

Conversation

henrirosten
Copy link
Collaborator

Set the cdx component type based on the following heuristic:

  • Set the default component type to 'library'
  • Set the component type to 'file' if the drv version string is missing

Resolves: #140

@henrirosten
Copy link
Collaborator Author

henrirosten commented Dec 23, 2024

Below is a quick test using wget as an example target. Following commands can be run from sbomnix devshell.

# Target: wget, include both buildtime and runtime dependencies:
❯ nix run github:tiiuae/sbomnix/584988e#sbomnix -- nixpkgs/58ff6e0#wget --buildtime
...
INFO     Wrote: sbom.cdx.json
INFO     Wrote: sbom.spdx.json
INFO     Wrote: sbom.csv

# How many unique packages (by name) there are in wget sbom?
❯ csvsql --verbose --query "select count(distinct name) from sbom" sbom.csv
count(distinct name)
337

# How many unique packages are missing the version string?
# After the changes from this PR, this many components would be
# classified with cdx component type 'file':
❯ csvsql --verbose --query "select count(distinct name) from sbom where version is null" sbom.csv
count(distinct name)
211

# Some examples of packages which would end-up classified as 'file' after
# the changes from this PR:
❯ csvsql --verbose --query "select distinct name from sbom where version is null" sbom.csv | csvlook | head
| name                                               |
| -------------------------------------------------- |
| 0001-Add-prototype-to-function-definitions.patch   |
| 06-initialize-the-symlink-flag.patch               |
| 07631601e6602bc49b8eac3aab9d2b35968d3e7a.patch     |
| 28-cve-2022-0529-and-cve-2022-0530.patch           |
| B-COW-0.007.tar.gz                                 |
| CPAN-Meta-Check-0.018.tar.gz                       |
| CVE-2019-13232-1.patch                             |
| CVE-2019-13232-2.patch                             |

All good so far.
However, it's easy to find cases where I think the classification seems to go wrong:

❯ csvsql --verbose --query "select name,out,store_path from sbom where version is null group by name" sbom.csv | csvlook | grep -vP "(\.patch|\.tar\.|\.t?gz|\.zip|-bash52-|-readline82-)"
| name                                               | out                                                                                            | store_path                                                                                         |
| -------------------------------------------------- | ---------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| autoreconf-hook                                    | /nix/store/xmbb1knxj25q34iqp23aa28mlc7x26ch-autoreconf-hook                                    | /nix/store/2a0rsx8y0qa0hj27869dwa79fa7r09cy-autoreconf-hook.drv                                    |
| bootstrap-stage-xgcc-gcc-wrapper-                  | /nix/store/7avw5y6vx6h94hg749msibskkrzzq7kb-bootstrap-stage-xgcc-gcc-wrapper-                  | /nix/store/43lvdp28w1wmj7qvj4cp846rq3y84k5q-bootstrap-stage-xgcc-gcc-wrapper-.drv                  |
| bootstrap-stage-xgcc-stdenv-linux                  | /nix/store/0n27p9zy7rp12xfws3dl8i3wf9a8f1kd-bootstrap-stage-xgcc-stdenv-linux                  | /nix/store/7njndm5dmwlxkc80w9gk347nsmcy6wk9-bootstrap-stage-xgcc-stdenv-linux.drv                  |
| bootstrap-stage0-binutils-wrapper-                 | /nix/store/hk2p2ccpsn8wr1nnfbllnhcmgmaii9rj-bootstrap-stage0-binutils-wrapper-                 | /nix/store/72c838vcc1440hfh6zkl8mna1nsskslv-bootstrap-stage0-binutils-wrapper-.drv                 |
| bootstrap-stage0-glibc-iconv-bootstrapFiles        | /nix/store/l06g8zzvnmljvz18p108clwaxzcq30ym-bootstrap-stage0-glibc-iconv-bootstrapFiles        | /nix/store/9v9k2cnqw962ldwabkyp7calmvp63lyd-bootstrap-stage0-glibc-iconv-bootstrapFiles.drv        |
| bootstrap-stage0-stdenv-linux                      | /nix/store/mdqwssbvg5cr14xxqamj77qlmna9hcyz-bootstrap-stage0-stdenv-linux                      | /nix/store/jhnz6wx6p5h0pqykj1i7jdzj8ddqlgnf-bootstrap-stage0-stdenv-linux.drv                      |
| bootstrap-stage1-gcc-wrapper-                      | /nix/store/p9riy1zqs8gq8j6qg6rw857ks8m2ml95-bootstrap-stage1-gcc-wrapper-                      | /nix/store/01b514fgfw953n7q1vickfa7aq41zqq8-bootstrap-stage1-gcc-wrapper-.drv                      |
| bootstrap-stage1-stdenv-linux                      | /nix/store/qk8m8gzpjk4vna5spmbz1xlff2mb4d3p-bootstrap-stage1-stdenv-linux                      | /nix/store/45xpqvq383243s730wmaka92mikrdy43-bootstrap-stage1-stdenv-linux.drv                      |
| bootstrap-stage2-stdenv-linux                      | /nix/store/8c12hhc7aqr4gcy6n039kv43w8c9jwcq-bootstrap-stage2-stdenv-linux                      | /nix/store/ismkxxgzhnyd8zl90r53yxqkqs6l44kk-bootstrap-stage2-stdenv-linux.drv                      |
| bootstrap-stage3-stdenv-linux                      | /nix/store/mv3nd0nm9dahd1s3qhqszvsb4j84l4fj-bootstrap-stage3-stdenv-linux                      | /nix/store/gp38xzkcj0ijj3xzdxvnsmrky5fz4bzy-bootstrap-stage3-stdenv-linux.drv                      |
| bootstrap-stage4-stdenv-linux                      | /nix/store/c2mw51ncnnvaard4nq0riqilmhk07dj5-bootstrap-stage4-stdenv-linux                      | /nix/store/f8p2g1kx4j6vvil6sm5hrmaqa2nfdfwp-bootstrap-stage4-stdenv-linux.drv                      |
| bootstrap-tools                                    | /nix/store/razasrvdg7ckplfmvdxv4ia3wbayr94s-bootstrap-tools                                    | /nix/store/05q48dcd4lgk4vh7wyk330gr2fr082i2-bootstrap-tools.drv                                    |
| busybox                                            | /nix/store/p9wzypb84a60ymqnhqza17ws0dvlyprg-busybox                                            | /nix/store/0m4y3j4pnivlhhpr5yqdvlly86p93fwc-busybox.drv                                            |
| config.guess-948ae97                               | /nix/store/vq0j27nvpks679djbiykl8ikdyj6z5a9-config.guess-948ae97                               | /nix/store/bamwxswxacs3cjdcydv0z7bj22d7g2kc-config.guess-948ae97.drv                               |
| config.sub-948ae97                                 | /nix/store/1p61qjvlqmwrqab3zp5yh3z8rf3mvjmz-config.sub-948ae97                                 | /nix/store/17jjjz36g6svn6kryg89l87y571a44pn-config.sub-948ae97.drv                                 |
| die-hook                                           | /nix/store/q7yqwfpc8b56sn5drqyb0hscvmfpjgk2-die-hook                                           | /nix/store/61854fyyiyawkprq7zf4pvrq7ksy2hdf-die-hook.drv                                           |
| expand-response-params                             | /nix/store/a6y72yfm7mxjnbgjm56l23i9k5mszkib-expand-response-params                             | /nix/store/082q12iqbm8i0s9jjkn6mqn3s08sddbw-expand-response-params.drv                             |
| find-xml-catalogs-hook                             | /nix/store/wx5nzqd94wxp3a2mcacragk4dixzfgy5-find-xml-catalogs-hook                             | /nix/store/9yq1167s48cv7hn8bnf8bn4gfd25lxi1-find-xml-catalogs-hook.drv                             |
| glibc-iconv-2.40                                   | /nix/store/nrymrxaqn1hcwgjycn3dzyl9i0lylifw-glibc-iconv-2.40                                   | /nix/store/dsnv5c2qx03mlw9ssrz4lfcdy4mpnqkr-glibc-iconv-2.40.drv                                   |
| install-shell-files                                | /nix/store/zrz201kl2cnx2i9vg253266fw653sxcj-install-shell-files                                | /nix/store/fwr3vgdizyxz7cjv8xczq7mcrflbkmaa-install-shell-files.drv                                |
| libidn2-2.3.7                                      | /nix/store/ma08vfhb5yipb31n2fymf2isk0gyb9ki-libidn2-2.3.7                                      | /nix/store/02piwsci6jgiipk0in2lj41aj6p6vln5-libidn2-2.3.7.drv                                      |
| locales-setup-hook.sh                              | /nix/store/1jjd4gpbr42b3bscsknm8ji91vwp21li-locales-setup-hook.sh                              | /nix/store/v8gdyjfapcis75cvxpdfw8zlx38alq1l-locales-setup-hook.sh.drv                              |
| make-binary-wrapper-hook                           | /nix/store/gqjd4bvd683s55r0jcgc9q67rvjnmfc6-make-binary-wrapper-hook                           | /nix/store/s35vnmn2y124xa6iw1kcqalixmca1s8m-make-binary-wrapper-hook.drv                           |
| make-shell-wrapper-hook                            | /nix/store/5iwa7fcljsi4ahj9znxfqfj0pbm54cd2-make-shell-wrapper-hook                            | /nix/store/2p8j0pjf4m63iksncbq9qsz3zms845cf-make-shell-wrapper-hook.drv                            |
| mirrors-list                                       | /nix/store/fvd90pv9l7bzgszciv0adhivysb95jnh-mirrors-list                                       | /nix/store/avk7dy1fdyrf7d4z0ad62db7bx2ccppv-mirrors-list.drv                                       |
| nuke-references                                    | /nix/store/yjfk0fn7smh88kd0xqvfhhy1gfxc1w4l-nuke-references                                    | /nix/store/17yimdihwq1lzr8man6mwd2gq94zb5vz-nuke-references.drv                                    |
| python-setup-hook.sh                               | /nix/store/lizjckh3h9wjaylafsma2v1wwyckmd4i-python-setup-hook.sh                               | /nix/store/a3ighs21cmgqhbfpv6bx9f5pcaxirj4c-python-setup-hook.sh.drv                               |
| raw                                                | /nix/store/d1xybymfx4ad0hy6zv97walg9v1dyzn6-raw                                                | /nix/store/dhjhlihqj08f3fs1cvsja0fims0dqnlw-raw.drv                                                |
| source                                             | /nix/store/hhinz3k4nh50l93k6r3617nrf9pnb975-source                                             | /nix/store/1vjg4z2rm4kaiglkmc6vkkp3wv30xd73-source.drv                                             |
| stdenv-linux                                       | /nix/store/m1p78gqlc0pw3sdbz3rdhklzm0g26g96-stdenv-linux                                       | /nix/store/vxckchzd4ny3dni980qf570fmfc3q5m6-stdenv-linux.drv                                       |
| tcl-package-hook                                   | /nix/store/59dmq1m6n4hcpqikr78sr3z5jk06120z-tcl-package-hook                                   | /nix/store/vkhapph94vhy5wd9g28xi7fp3vcn2lyn-tcl-package-hook.drv                                   |
| update-autotools-gnu-config-scripts-hook           | /nix/store/ljlah5wqcbix5wg8rvm3g8rc7k9zn1qg-update-autotools-gnu-config-scripts-hook           | /nix/store/bfv1sg2nvdk6g7c2hl4rcdsrlc8j8d58-update-autotools-gnu-config-scripts-hook.drv           |
| wrap-python-hook                                   | /nix/store/ywn3i812qicw2cqx4biillriqf2nhr8z-wrap-python-hook                                   | /nix/store/h18pfjrnqql4xf45s6n621m3i1k4ljwq-wrap-python-hook.drv                                   |

In the above example, I think the following would be clearly classified incorrectly:

/nix/store/0m4y3j4pnivlhhpr5yqdvlly86p93fwc-busybox.drv
/nix/store/dsnv5c2qx03mlw9ssrz4lfcdy4mpnqkr-glibc-iconv-2.40.drv
/nix/store/02piwsci6jgiipk0in2lj41aj6p6vln5-libidn2-2.3.7.drv
/nix/store/05q48dcd4lgk4vh7wyk330gr2fr082i2-bootstrap-tools.drv

Not sure which is worse: that we just classify everything as library or that we try to guess the classification and risk incorrectly classifying cdx components as file when they are really something else.

@henrirosten henrirosten force-pushed the classify-cdx-component-type branch from 584988e to 693e1cc Compare January 8, 2025 08:29
@henrirosten
Copy link
Collaborator Author

henrirosten commented Jan 8, 2025

693e1cc classifies the cdx component type to file if the version string is missing and if the out-path matches a specific pattern. The additional pattern-match avoids (or attempts to avoid) the earlier problem with incorrect classification.

Also, 693e1cc rebased the changes on latest main.

@henrirosten
Copy link
Collaborator Author

Using wget as an example target, and the changes from this PR@693e1cc:

# Target: wget, include both buildtime and runtime dependencies:
❯ nix run github:tiiuae/sbomnix/693e1cc#sbomnix -- nixpkgs/58ff6e0#wget --buildtime
...
INFO     Wrote: sbom.cdx.json
INFO     Wrote: sbom.spdx.json
INFO     Wrote: sbom.csv

# Full list of unique package names that are classified as type 'file':
❯ jq -cr '.components[] | select(.type=="file") | .name' sbom.cdx.json | uniq

0001-Add-prototype-to-function-definitions.patch
06-initialize-the-symlink-flag.patch
07631601e6602bc49b8eac3aab9d2b35968d3e7a.patch
28-cve-2022-0529-and-cve-2022-0530.patch
B-COW-0.007.tar.gz
CPAN-Meta-Check-0.018.tar.gz
CVE-2019-13232-1.patch
CVE-2019-13232-2.patch
CVE-2019-13232-3.patch
CVE-2021-4217.patch
Capture-Tiny-0.48.tar.gz
Class-Inspector-1.36.tar.gz
Clone-0.46.tar.gz
Encode-Locale-1.05.tar.gz
ExtUtils-Config-0.008.tar.gz
ExtUtils-Helpers-0.026.tar.gz
ExtUtils-InstallPaths-0.012.tar.gz
File-ShareDir-1.118.tar.gz
File-ShareDir-Install-0.14.tar.gz
HTTP-Daemon-6.16.tar.gz
HTTP-Date-6.06.tar.gz
HTTP-Message-6.45.tar.gz
IO-HTML-1.004.tar.gz
LWP-MediaTypes-6.04.tar.gz
Linux-PAM-1.6.1.tar.xz
Module-Build-0.4234.tar.gz
Module-Build-Tiny-0.047.tar.gz
PadWalker-2.5.tar.gz
Python-3.12.7.tar.xz
Test-Deep-1.204.tar.gz
Test-Fatal-0.017.tar.gz
Test-Needs-0.002010.tar.gz
Test-Warnings-0.032.tar.gz
TimeDate-2.33.tar.gz
Try-Tiny-0.31.tar.gz
URI-5.21.tar.gz
acl-2.3.2.tar.gz
attr-2.5.2.tar.gz
audit-4.0.tar.gz
autoconf-2.69.tar.xz
autoconf-2.72.tar.xz
autoconf-archive-2024.10.16.tar.xz
automake-1.16.5.tar.xz
basename.patch
bash-5.2.tar.gz
binutils-2.43.1.tar.bz2
bison-3.8.2.tar.gz
bootstrap-tools.tar.xz
byacc-20240109.tgz
bzip2-1.0.6.2-autoconfiscated.patch
bzip2-1.0.8.tar.gz
coreutils-9.5.tar.xz
cracklib-2.10.0.tar.bz2
cracklib-words-2.10.0.gz
curl-8.11.0.tar.xz
db-4.8.30.tar.gz
dejagnu-1.6.3.tar.gz
diffutils-3.10.tar.xz
docbook-style-xsl-non-recursive-string-subst.patch
docbook-xml-4.5.zip
docbook-xsl-nons-1.79.2.tar.bz2
ed-1.20.2.tar.lz
expat-2.6.4.tar.xz
expect-configure-c99.patch
expect5.45.4.tar.gz
file-5.45.tar.gz
findutils-4.10.0.tar.xz
fix-build-time-run-tcl.patch
flex-2.6.4.tar.gz
gawk-5.3.1.tar.xz
gcc-13.3.0.tar.xz
gdbm-1.24.tar.gz
gettext-0.21.1.tar.gz
gettext-1.07.tar.gz
glibc-2.26.patch
glibc-2.40.tar.xz
gmp-6.3.0.tar.bz2
grep-3.11.tar.xz
gzip-1.13.tar.xz
help2man-1.49.3.tar.xz
isl-0.20.tar.xz
itstool-2.0.7.tar.bz2
keyutils-1.6.3.tar.gz
krb5-1.21.3.tar.gz
libbsd-0.12.2.tar.xz
libcap-ng-0.8.5.tar.gz
libffi-3.4.6.tar.gz
libidn2-2.3.7.tar.gz
libmd-1.1.0.tar.xz
libssh2-1.11.1.tar.gz
libtool-2.4.7.tar.gz
libunistring-1.2.tar.gz
libxcrypt-4.4.36.tar.xz
libxml2-2.13.4.tar.xz
libxslt-1.1.42.tar.xz
linux-6.10.tar.xz
locales-setup-hook.sh
lzip-1.24.1.tar.gz
m4-1.4.19.tar.bz2
mailcap-2.1.54.tar.xz
make-4.4.1.tar.gz
mpc-1.3.1.tar.gz
mpdecimal-4.0.0.tar.gz
mpfr-4.2.1.tar.xz
musl.patch
ncurses-6.4.tar.gz
nghttp2-1.64.0.tar.bz2
openssl-3.3.2.tar.gz
patch-2.7.6.tar.xz
patchelf-0.15.0.tar.bz2
patchutils-0.3.3.tar.xz
pcre2-10.44.tar.bz2
perl-5.40.0.tar.gz
pkg-config-0.29.2.tar.gz
python-setup-hook.sh
readline-8.2.tar.gz
sed-4.9.tar.xz
sqlite-autoconf-3460100.tar.gz
sqlite-doc-3460100.zip
tar-1.35.tar.xz
tcl8.6.15-src.tar.gz
texinfo-7.1.1.tar.xz
tzcode2024b.tar.gz
tzdata2024b.tar.gz
unzip60.tar.gz
util-linux-2.39.4.tar.xz
wget-1.25.0.tar.lz
which-2.21.tar.gz
xz-5.6.3.tar.xz
zlib-1.3.1.tar.gz

@henrirosten henrirosten requested a review from a team January 8, 2025 08:47
@henrirosten
Copy link
Collaborator Author

@arianvp: Any thoughts?

@arianvp
Copy link

arianvp commented Jan 8, 2025

the libidn2 false positive is because of the one used for boostrapping doesn't set pname. we can probably fix that upstream:

https://github.com/NixOS/nixpkgs/blob/1a131ecc6623fd72329dc1202d4f05b074451970/pkgs/development/libraries/libidn2/no-bootstrap-reference.nix#L11

I expect the glibc-inconv to be a similar case. it's probably some bootstrap variant that drops the pname by accident.

Perhaps we could try to fix these upstream?

(\.tar\.|\?|\.[a-z]+$) is a bit confusing to me. Why are we matching the literal character ?

Anyhow I tried this out on one of my NixOS hosts and it seems to work well. I can now easily filter out config files. Seems to work well!

(Though it classifies the toplevel derivation as a file now... Though library before as also wrong hehe)

Perhaps it makes sense to put this option behind a flag if you're afraid of mis-classifying things?

@henrirosten
Copy link
Collaborator Author

henrirosten commented Jan 8, 2025

(\.tar\.|\?|\.[a-z]+$) is a bit confusing to me. Why are we matching the literal character ?

To match out-paths such as: opensp-1.5.2-c11-using.patch?id=688d9675782dfc162d4e6cff04c668f7516118d0

Though it classifies the toplevel derivation as a file now...

Can you clarify this? What is your toplevel derivation out-path?

Perhaps it makes sense to put this option behind a flag if you're afraid of mis-classifying things?

I think with the addition of the out-path match requirement, it should be pretty safe. Also, as discussed, we can't make it much worse compared to how it now works, simply classifying everything as library.

Perhaps we could try to fix these upstream?

I also hope these and similar cases would be fixed upstream.

@henrirosten henrirosten merged commit 79f5f02 into main Jan 15, 2025
3 checks passed
@henrirosten henrirosten deleted the classify-cdx-component-type branch January 20, 2025 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mark config files as type=file instead of type=application
2 participants