Binary-Artifacts false positives #1256

evverx · 2021-11-13T05:06:18Z

As far as I can tell, the Binary-Artifacts check was supposed to catch executables people can run unintentionally after cloning repositories but in reality it seems to just look for extensions and, for example, flags projects using binary files for testing purposes. (The check would be even more noisy if it ignored extensions and searched for magic numbers by analogy with file because it would effectively penalize projects for keeping, for example, regression tests generated by fuzz targets (a lot of which look like executables or, maybe, even kind of executables if ELF files are in seed corpora). As an example, below is what the Binary-Artifacts says about systemd:

{
  "date": "2021-11-11",
  "repo": {
    "name": "github.com/systemd/systemd",
    "commit": "9cc615460830afdb51ad23e594906bbe60a3b25a"
  },
  "scorecard": {
    "version": "3.1.1-57-g8da30e6",
    "commit": "8da30e63afbc62e25c2c1252003aed9d34c3d04d"
  },
  "score": 0.0,
  "checks": [
    {
      "details": [
        "Warn: binary detected: test/dmidecode-dumps/HP-Z600.bin",
        "Warn: binary detected: test/dmidecode-dumps/Lenovo-ThinkPad-X280.bin",
        "Warn: binary detected: test/dmidecode-dumps/Lenovo-Thinkcentre-m720s.bin"
      ],
      "score": 0,
      "reason": "binaries present in source code",
      "name": "Binary-Artifacts",
      "documentation": {
        "url": "https://github.com/ossf/scorecard/blob/8da30e63afbc62e25c2c1252003aed9d34c3d04d/docs/checks.md#binary-artifacts",
        "short": "Determines if the project has generated executable (binary) artifacts in the source repository."
      }
    }
  ],
  "metadata": null
}

The text was updated successfully, but these errors were encountered:

laurentsimon · 2021-11-15T16:04:20Z

Thanks for the feedback. Scorecard looks for both magic number and extension.
We check for binaries because the code cannot be reviewed and it may contain arbitrary code run by the user, as you say.

We discard binaries in folders named testdata, which you may use for unit tests, regression tests, etc. This is not documented so I'll add a note in the documentation. Do you think this is acceptable?

evverx · 2021-11-15T19:14:03Z

Scorecard looks for both magic number and extension

Judging by https://github.com/h2non/filetype#supported-types filetype.Get isn't exactly useful when it comes to detecting executables that can be run by accident so effectively scorecard almost always falls back to file extensions.

the code cannot be reviewed

If files like that are a part of a testsuite they're supposed to be reviewed automatically by the tests. I'm not exactly sure how else to review test cases provided by OSS-Fuzz for example.

We discard binaries in folders named testdata

I think it's mostly a go convention and I don't think it would be feasible to change codebases written in other languages to make scorecard happy.

Do you think this is acceptable?

Given that for example LGTM (https://lgtm.com/help/lgtm/file-classification) and codeql get it right somehow I don't think it is.

Anyway, I think it's just another check that penalizes projects with extensive testsuites.

laurentsimon · 2021-11-15T21:33:58Z

Scorecard looks for both magic number and extension

Judging by https://github.com/h2non/filetype#supported-types filetype.Get isn't exactly useful when it comes to detecting executables that can be run by accident so effectively scorecard almost always falls back to file extensions.

the code cannot be reviewed

If files like that are a part of a testsuite they're supposed to be reviewed automatically by the tests. I'm not exactly sure how else to review test cases provided by OSS-Fuzz for example.

agreed. We need a way to filter them out, which is why we ignore testdata today.

We discard binaries in folders named testdata

I think it's mostly a go convention and I don't think it would be feasible to change codebases written in other languages to make scorecard happy.

We could add more folder/file names to filter out binaries, but there will always be some false positives this way. Would you like to have the list hardcoded, or given as option to scorecard? An option allows users to run it with the convention for the language they use, if there is any; and with better knowledge of the repo.
What would be a good starting point for a list of folder/file names to discard?

Do you think this is acceptable?

Given that for example LGTM (https://lgtm.com/help/lgtm/file-classification) and codeql get it right somehow I don't think it is.

Anyway, I think it's just another check that penalizes projects with extensive testsuites.

evverx · 2021-11-15T23:42:17Z

We could add more folder/file names to filter out binaries, but there will always be some false positives this way

I think ideally the check should start utilizing libmagic used by the file utility to really look for binary executables. If it was used, systemd wouldn't be flagged by scorecard and it wouldn't be even necessary to filter out the tests:

   1653 C source, ASCII text
   1516 ASCII text
    421 C source, UTF-8 Unicode text
    268 XML 1.0 document, ASCII text
    170 Bourne-Again shell script, ASCII text executable
    124 XML 1.0 document, UTF-8 Unicode text
     92 data
     75 UTF-8 Unicode text
     34 GNU gettext message catalogue, UTF-8 Unicode text
     24 Python script, ASCII text executable
     21 POSIX shell script, ASCII text executable
     17 ASCII text, with very long lines
     14 JSON data
     11 XML 1.0 document, UTF-8 Unicode text, with very long lines
     10 Python script, UTF-8 Unicode text executable
      9 awk or perl script, ASCII text
      8 empty
      8 Bourne-Again shell script, UTF-8 Unicode text executable
      7 PDP-11 UNIX/RT ldp
      7 HTML document, ASCII text
      6 ASCII text, with no line terminators
      5 UTF-8 Unicode text, with very long lines
      5 C source, UTF-8 Unicode text, with very long lines
      3 UTF-8 Unicode text, with CRLF line terminators
      3 POSIX shell script, UTF-8 Unicode text executable
      3 MIPSEB Ucode
      3 exported SGML document, ASCII text
      2 XML 1.0 document, ASCII text, with very long lines
      2 very short file (no magic)
      2 SVG Scalable Vector Graphics image
      2 makefile script, ASCII text
      2 ISO-8859 text
      2 HTML document, UTF-8 Unicode text
      2 GNU gettext message catalogue, ASCII text
      2 C source, ASCII text, with very long lines
      1 Web Open Font Format, TrueType, length 42844, version 2.0
      1 Web Open Font Format, TrueType, length 42672, version 0.0
      1 unified diff output, UTF-8 Unicode text
      1 Python script, UTF-8 Unicode text executable, with very long lines
      1 Python script, ASCII text executable, with very long lines
      1 PNG image data, 16 x 16, 8-bit/color RGBA, non-interlaced
      1 PGP Secret Sub-key -
      1 PGP/GPG key public ring (v4) created Wed Jul  9 03
      1 Perl script text executable
      1 PDP-11 old overlay
      1 PDP-11 kernel overlay
      1 PC bitmap, Windows 98/2000 and newer format, 295 x 245 x 32
      1 Non-ISO extended-ASCII text, with very long lines
      1 Non-ISO extended-ASCII text
      1 ISO-8859 text, with very long lines
      1 ISO-8859 text, with no line terminators
      1 exported SGML document, UTF-8 Unicode text

What would be a good starting point for a list of folder/file names to discard?

I'm not sure. It would be great if there was a way figure out how LGTM and codeql filter out tests. Unfortunately it seems their scripts classifying files aren't open source (or at least I haven't been able to find them yet)

evverx · 2021-11-16T00:04:18Z

Would you like to have the list hardcoded, or given as option to scorecard? An option allows users to run it with the convention for the language they use, if there is any; and with better knowledge of the repo.

I think by default scorecard should produce meaningful metrics without any additional options to let people run it against different repositories written in different languages. Options would probably help if it was necessary to relax or tighten the check on a case-by-case basic should the need arise.

evverx · 2021-11-16T00:25:04Z

One last thing. As long as scorecard can be tricked into thinking that there are no binary artifacts by simply removing file extensions I don't think it should have the "HIGH" priority and affect the global score that much.

laurentsimon · 2021-11-16T01:46:47Z

Judging by https://github.com/h2non/filetype#supported-types filetype.Get isn't exactly useful when it comes to detecting executables that can be run by accident so effectively scorecard almost always falls back to file extensions.

interesting. Our intention was to use the libmagic equivalent in go-land. @naveensrinivasan knows amore about h2non/filetype. I thought it detected ELF files. Right now we always fall back to using extension if filetype returns nothing. You're saying maybe we should not do that...?

I'm not sure. It would be great if there was a way figure out how LGTM and codeql filter out tests. Unfortunately it seems their scripts classifying files aren't open source (or at least I haven't been able to find them yet)

Do they try to detect ELF files? If they are actually running things, I suppose they can strace and see if it's opened by test files and so on. For non-C code, codeQl can figure this out accurately using the CFG, I believe.

can be tricked into thinking that there are no binary artifacts by simply removing file extensions I don't think it should have the "HIGH" priority and affect the global score that much.

@naveensrinivasan is working on unit tests in general. Let's investigate this further.

evverx · 2021-11-16T02:06:08Z

I thought it detected ELF files

As far as I can tell it does. I meant those ".bin" files that would just disappear from the scorecard radar if I dropped the ".bin" extension.

Right now we always fall back to using extension if filetype returns nothing. You're saying maybe we should not do that...?

I think it depends on how clever the library is. If it was libmagic I think it would make sense to ignore extensions but I've never used h2non/filetype

evverx · 2021-11-16T02:19:03Z

I'm not sure if it helps but here's all the types the library was able to recognize in the systemd repository

   4542 unknown
      2 woff
      1 png
      1 bmp

laurentsimon · 2021-11-16T02:49:36Z

good catch, thanks for reporting this. I'll let @naveensrinivasan comment.

evverx · 2021-11-16T03:48:23Z

I've just opened #1279 based on my understanding of how the library works

evverx · 2021-11-16T04:28:37Z

FWIW looking at h2non/filetype#79 I don't think the library is fuzzed on a regular basis so it isn't clear to me how safe it is to run scorecard against random repositories :-)

evverx · 2021-11-16T05:15:50Z

Do they try to detect ELF files? If they are actually running things, I suppose they can strace and see if it's opened by test files and so on.

I'm not sure about codeql but LGTM doesn't run any tests. As far as I know, LGTM had a script keeping track of files build scripts tried to open to install missing dependencies though. Anyway, I'm 50% sure whatever is kept in directories named *test* is considered tests there.

evverx · 2021-11-16T05:53:36Z

I closed that PR mostly due to h2non/filetype#73. I'm not sure what to do about it. Looks like in its current form the library can't be used to get deterministic results.

laurentsimon · 2021-11-16T14:29:42Z

mhhh that's not good. We'll investigate if there's another lib we could use.
I have an idea to reduce false positive. My take away frmo your issue is that .bin is too generic in general and does not map directly to a "binary that contains code and may be run by users". What if we removed the bin from the list of extension we look for? The other extensions are more specific, so it's unlikely users would use them to store arbitrary files like fuzzing input, etc. wdut?

There are exceptions, e.g. the binutils repo may still do poorly.

evverx · 2021-11-17T00:44:33Z

What if we removed the bin from the list of extension we look for?

Sounds good to me. I've just opened #1288

Those files most likely contain binary data used by tests for example. It should be safe to remove this because executables disguised as ".bin" files will still be caught and flagged by scorecard before it even have a chance to look at extensions. It should address #1256

laurentsimon · 2021-12-03T17:33:30Z

shall we close this issue?

evverx · 2021-12-03T17:46:03Z

I don't think scorecard should complain about tests so I think I'd keep the issue open

westonsteimel · 2021-12-10T09:15:11Z

Judging by https://github.com/h2non/filetype#supported-types filetype.Get isn't exactly useful when it comes to detecting executables that can be run by accident so effectively scorecard almost always falls back to file extensions.

gabriel-vasile/mimetype seems to have a decent list of supported types and I know it is used in anchore/stereoscope and in turn by syft and grype

evverx · 2021-12-23T07:08:02Z

To judge from #1408 (comment), it should be addressed by allowing specifying paths in config files and while I don't think it's ideal it's one way to do it and it has its advantages. To level the field though I think scorecard shouldn't filter the "testdata" directories because it affects the "raw" results.

laurentsimon · 2021-12-24T02:41:56Z

To judge from #1408 (comment), it should be addressed by allowing specifying paths in config files and while I don't think it's ideal it's one way to do it and it has its advantages. To level the field though I think scorecard shouldn't filter the "testdata" directories because it affects the "raw" results.

good point. We'll remove it once we have the raw results ready.

evverx added the kind/bug Something isn't working label Nov 13, 2021

evverx mentioned this issue Nov 14, 2021

🐛Binary-Artifacts: skip files that look like tests #1263

Closed

evverx mentioned this issue Nov 17, 2021

🐛Binary-Artifacts: no longer complain about ".bin" files #1288

Merged

This was referenced Nov 17, 2021

Try out scorecard action #1074

Closed

✨ Remove multimedia formats from Binary Artifacts #1300

Merged

westonsteimel mentioned this issue Dec 10, 2021

ossf/scorecard vulnerabilities fix tracking issue anchore/syft#593

Open

evverx mentioned this issue Dec 14, 2021

Feature: add Scorecard to OSS-Fuzz and CIFuzz #1389

Open

Dentrax mentioned this issue Dec 21, 2021

add ability to pass ignore list #1406

Open

evverx closed this as completed Dec 23, 2021

This was referenced Dec 23, 2021

"Pinned-Dependencies" shouldn't complain about known GHActions that can't be pinned to any specific version #1305

Closed

The "Token-Permissions" check seems to complain about codeql and labeler #1257

Closed

laurentsimon mentioned this issue Dec 28, 2021

Feature: do not filter testdata in results once we have raw result support is landed #1428

Open

rarkins mentioned this issue Mar 18, 2022

Security Policy violation Binary Artifacts renovatebot/renovate#14721

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary-Artifacts false positives #1256

Binary-Artifacts false positives #1256

evverx commented Nov 13, 2021

laurentsimon commented Nov 15, 2021 •

edited

Loading

evverx commented Nov 15, 2021 •

edited

Loading

laurentsimon commented Nov 15, 2021 •

edited

Loading

evverx commented Nov 15, 2021

evverx commented Nov 16, 2021

evverx commented Nov 16, 2021

laurentsimon commented Nov 16, 2021

evverx commented Nov 16, 2021 •

edited

Loading

evverx commented Nov 16, 2021

laurentsimon commented Nov 16, 2021

evverx commented Nov 16, 2021

evverx commented Nov 16, 2021

evverx commented Nov 16, 2021 •

edited

Loading

evverx commented Nov 16, 2021

laurentsimon commented Nov 16, 2021

evverx commented Nov 17, 2021

laurentsimon commented Dec 3, 2021

evverx commented Dec 3, 2021

westonsteimel commented Dec 10, 2021 •

edited

Loading

evverx commented Dec 23, 2021

laurentsimon commented Dec 24, 2021

Binary-Artifacts false positives #1256

Binary-Artifacts false positives #1256

Comments

evverx commented Nov 13, 2021

laurentsimon commented Nov 15, 2021 • edited Loading

evverx commented Nov 15, 2021 • edited Loading

laurentsimon commented Nov 15, 2021 • edited Loading

evverx commented Nov 15, 2021

evverx commented Nov 16, 2021

evverx commented Nov 16, 2021

laurentsimon commented Nov 16, 2021

evverx commented Nov 16, 2021 • edited Loading

evverx commented Nov 16, 2021

laurentsimon commented Nov 16, 2021

evverx commented Nov 16, 2021

evverx commented Nov 16, 2021

evverx commented Nov 16, 2021 • edited Loading

evverx commented Nov 16, 2021

laurentsimon commented Nov 16, 2021

evverx commented Nov 17, 2021

laurentsimon commented Dec 3, 2021

evverx commented Dec 3, 2021

westonsteimel commented Dec 10, 2021 • edited Loading

evverx commented Dec 23, 2021

laurentsimon commented Dec 24, 2021

laurentsimon commented Nov 15, 2021 •

edited

Loading

evverx commented Nov 15, 2021 •

edited

Loading

laurentsimon commented Nov 15, 2021 •

edited

Loading

evverx commented Nov 16, 2021 •

edited

Loading

evverx commented Nov 16, 2021 •

edited

Loading

westonsteimel commented Dec 10, 2021 •

edited

Loading