-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary-Artifacts false positives #1256
Comments
Thanks for the feedback. Scorecard looks for both magic number and extension. We discard binaries in folders named |
Judging by https://github.com/h2non/filetype#supported-types
If files like that are a part of a testsuite they're supposed to be reviewed automatically by the tests. I'm not exactly sure how else to review test cases provided by OSS-Fuzz for example.
I think it's mostly a go convention and I don't think it would be feasible to change codebases written in other languages to make
Given that for example LGTM (https://lgtm.com/help/lgtm/file-classification) and codeql get it right somehow I don't think it is. Anyway, I think it's just another check that penalizes projects with extensive testsuites. |
agreed. We need a way to filter them out, which is why we ignore
We could add more folder/file names to filter out binaries, but there will always be some false positives this way. Would you like to have the list hardcoded, or given as option to scorecard? An option allows users to run it with the convention for the language they use, if there is any; and with better knowledge of the repo.
|
I think ideally the check should start utilizing
I'm not sure. It would be great if there was a way figure out how LGTM and codeql filter out tests. Unfortunately it seems their scripts classifying files aren't open source (or at least I haven't been able to find them yet) |
I think by default scorecard should produce meaningful metrics without any additional options to let people run it against different repositories written in different languages. Options would probably help if it was necessary to relax or tighten the check on a case-by-case basic should the need arise. |
One last thing. As long as |
interesting. Our intention was to use the libmagic equivalent in go-land. @naveensrinivasan knows amore about
Do they try to detect ELF files? If they are actually running things, I suppose they can strace and see if it's opened by test files and so on. For non-C code, codeQl can figure this out accurately using the CFG, I believe.
@naveensrinivasan is working on unit tests in general. Let's investigate this further. |
As far as I can tell it does. I meant those ".bin" files that would just disappear from the scorecard radar if I dropped the ".bin" extension.
I think it depends on how clever the library is. If it was libmagic I think it would make sense to ignore extensions but I've never used h2non/filetype |
I'm not sure if it helps but here's all the types the library was able to recognize in the systemd repository
|
good catch, thanks for reporting this. I'll let @naveensrinivasan comment. |
I've just opened #1279 based on my understanding of how the library works |
FWIW looking at h2non/filetype#79 I don't think the library is fuzzed on a regular basis so it isn't clear to me how safe it is to run |
I'm not sure about codeql but LGTM doesn't run any tests. As far as I know, LGTM had a script keeping track of files build scripts tried to open to install missing dependencies though. Anyway, I'm 50% sure whatever is kept in directories named |
I closed that PR mostly due to h2non/filetype#73. I'm not sure what to do about it. Looks like in its current form the library can't be used to get deterministic results. |
mhhh that's not good. We'll investigate if there's another lib we could use. There are exceptions, e.g. the |
Sounds good to me. I've just opened #1288 |
Those files most likely contain binary data used by tests for example. It should be safe to remove this because executables disguised as ".bin" files will still be caught and flagged by scorecard before it even have a chance to look at extensions. It should address #1256
shall we close this issue? |
I don't think scorecard should complain about tests so I think I'd keep the issue open |
gabriel-vasile/mimetype seems to have a decent list of supported types and I know it is used in anchore/stereoscope and in turn by syft and grype |
To judge from #1408 (comment), it should be addressed by allowing specifying paths in config files and while I don't think it's ideal it's one way to do it and it has its advantages. To level the field though I think |
good point. We'll remove it once we have the raw results ready. |
As far as I can tell, the Binary-Artifacts check was supposed to catch executables people can run unintentionally after cloning repositories but in reality it seems to just look for extensions and, for example, flags projects using binary files for testing purposes. (The check would be even more noisy if it ignored extensions and searched for magic numbers by analogy with
file
because it would effectively penalize projects for keeping, for example, regression tests generated by fuzz targets (a lot of which look like executables or, maybe, even kind of executables if ELF files are in seed corpora). As an example, below is what the Binary-Artifacts says aboutsystemd
:The text was updated successfully, but these errors were encountered: