-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2: list dependencies from either import path or go binary #71
Conversation
Hi @wlynch! Here's the first PR |
Todo:
|
All the clean ups are done. Ready for code review : ) |
I'm not cleaning up the commits, because I always use squash merge when merging a PR. |
Hi @wlynch, thank you for the detailed review! It's now ready for another round of reviews when you have time. |
@wlynch I tried using here's my implementation:
And I got the error when running unit test for cli02 test module:
I think the root cause is that golang.org/x/sys is a module, but it is not a package. So we cannot load the package golang.org/x/sys. |
Therefore, I think I'd prefer staying with existing implementation for loading module metadata. |
That's annoying. :( Using CLI based impl in the meantime SGTM! |
v2/gocli/go_binary.go
Outdated
if localModule.Dir == "" { | ||
return nil, fmt.Errorf("Module %v's local directory is empty. Did you run go mod download?", ref.Path) | ||
} | ||
if localModule.Version != ref.Version { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes cause problems if the current module context doesn't match the same version as the binary. I think we'll need to rely on go list -m
(or some library equivalent if available) instead.
To test this I used a modified cli02 binary using golang.org/x/[email protected]
, with a go-licenses@v2 using golang.org/x/[email protected]
(diff)
$ go test ./...
--- FAIL: TestListModulesInGoBinary (0.28s)
--- FAIL: TestListModulesInGoBinary/../tests/modules/cli02 (0.13s)
go_binary_test.go:81: Found golang.org/x/tools v0.1.3 in go binary, but v0.1.4 is downloaded in go modules. Are you running this tool from the working dir to build the binary you are analyzing?
FAIL
$ go list -json -m golang.org/x/[email protected] # Even though last command failed, module is downloaded locally
{
"Path": "golang.org/x/tools",
"Version": "v0.1.3",
"Time": "2021-06-09T21:40:20Z",
"Dir": "/usr/local/google/home/wlynch/pkg/mod/golang.org/x/[email protected]",
"GoMod": "/usr/local/google/home/wlynch/pkg/mod/cache/download/golang.org/x/tools/@v/v0.1.3.mod",
"GoVersion": "1.17"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I didn't know we could use go list -json -m <module@version>
to also download the module and get metadata at the same time. And I just tried, it's not restricted by the current module dir. I can run the command anywhere to get the modules.
I think we are now very close to lift the restriction of go module dir, if we just go list -json -m <the-entire-list-of-modules>
to get all the modules.
There's one remaining challenge to me, the main module to build a binary is usually not versioned. How can we know what it is if outside of the go working dir?
This causes cause problems if the current module context doesn't match the same version as the binary.
For clarification, it was expected behavior in my design. How do we know where the main module is and whether it is the correct version? If we are in a working dir and found go modules version has a mismatch, a more serious problem is that most likely the main module's version is incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main module to build a binary is usually not versioned. How can we know what it is if outside of the go working dir?
I think you can pull this out from go version
, as long as the binary was built from a released module version. e.g.
# Install from github.com/google/go-licenses@latest
$ go version -m go-licenses
go-licenses: devel +b7a85e0003
path github.com/google/go-licenses
mod github.com/google/go-licenses v0.0.0-20210816172045-3099c18c36e1 h1:ZK63Yns/0Y8hE5y50WuSsfFWNPmpYDQ9tzh/J2vWV8c=
# Install from master
$ go version -m go-licenses
go-licenses: devel +b7a85e0003
path github.com/google/go-licenses
mod github.com/google/go-licenses (devel)
(+b7a85e0003
corresponds to the go tool version, not the module version, so we can't infer the source commit from there)
I don't know how we can infer version information from devel
sources. 😞 The answer might just be we can't for now and just throw an error for the time being telling people they need to go install @version
if they want it to work.
If we are in a working dir and found go modules version has a mismatch, a more serious problem is that most likely the main module's version is incorrect
Assuming main module == binary module here, I think this is expected. The binary module is going to be different from the local working directory module, unless the local working directory module is the module that produces the binary (but if that were the case, my expectation is you would run go-licenses on the source package itself rather than the binary). That's why I don't think we can join the modules here, not only because of version mismatches, but also different replacements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binary module is going to be different from the local working directory module, unless the local working directory module is the module that produces the binary (but if that were the case, my expectation is you would run go-licenses on the source package itself rather than the binary)
My use-case is exactly running go-licenses in the same working directory that produces the binary, that's a prerequisite for running the tool.
The reason I chose to run the tool on the binary instead of source code is because some go module dependencies (test / tool dependencies) are trimmed when compiling to the binary, so we can check license only for the modules included in the binary.
When you said running go-licenses on the source package itself, I assume you are talking about using packages.Load like the same logic in go-licenses v1. I think both solutions work, if you have strong opinions, I can try to use packages.Load instead.
I personally feel that it's better to rely on the modules metainfo from built binary, because it's better for single source of truth -- the go compiler checks for all dependencies, compile to a binary and record the modules metainfo in the binary.
On the contrary, if we use packages.Load to get all the modules from a package, the go compiler will collect all dependencies once, and packages.Load will get all the modules for the second time. How do we guarantee the two are giving the same results? I think there are potential risks for:
- version skew -- when go compiler has a different version from packages.Load
- build tag / config skew -- go compiler uses different build tags / flags than what packages.Load is using
If we read the modules meta info directly from the binary, these risks are removed, because only the go compiler decides what modules are pulled in.
I believe we are making a trade off here.
Option 1 - the tool only runs in the same workdir used to build the binary.
Option 2 - the tool only works for binaries built by go install module@version
.
Considering the major user scenario should be module authors adding license info to binaries built by themselves, I think option 1 is better.
After some experimentation using the new tool in github.com/kubeflow/pipelines, my typical licensing management workflow is like the following:
- keep a copy of generated license csv file in the repo
- in presubmit test, besides running go unit tests, also run
go-licenses csv
and verify the license csv is up-to-date. When there's an error, the PR author should check licenses for updated dependencies and update the licenses csv. - when releasing,
go-licenses save
the notices to the container image containing the binary
In all the scenarios, the person/CICD tool running go-licenses should be using it in exactly the same workdir as where the binary is built.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the additional context! I was under the assumption you wanted to include additional binary tools as part of the license output, or to generate a license summary from a standalone unrelated binary. IIUC, you want to use the binary metadata as a shortcut to generate a minimal license summary that only includes dependencies that would be included in binaries that you are distributing.
The reason I chose to run the tool on the binary instead of source code is because some go module dependencies (test / tool dependencies) are trimmed when compiling to the binary, so we can check license only for the modules included in the binary.
Test dependencies are already excluded unless Test: true
is included in packages.Config (which is currently not enabled for go-licenses). Related: #62
Tool dependencies (assuming this is following https://github.com/golang/go/wiki/Modules#how-can-i-track-tool-dependencies-for-a-module) would be treated like any other dependency and only be present if they are included transitively.
I think this would cover most cases to generate a minimal license summary, but if you have any examples in your project that don't fit well let me know! Would happy to dive deeper here.
version skew -- when go compiler has a different version from packages.Load
I'm not particularly worried about go compiler vs packages.Load version skew. Go has a pretty good track record when it comes to backwards compatibility. packages.Load is used by other tools in golang.org/x/tools, so I would imagine/hope any breaking changes / incompatibilities would be noticed prior to major Go releases.
What I am more worried about is version skew between the code and the binary, since the binary could have been built from a different version than what is currently present in the repo / go.mod. I'd rather lean on the code as the source of truth as much as possible, since that's what go.mod
is referring to.
build tag / config skew -- go compiler uses different build tags / flags than what packages.Load is using
Although go-licenses does not directly accept build flags on its own CLI, we do support build flags via GOFLAGS.
If we wanted, we could pass through flags via packages.Config
.
Considering the major user scenario should be module authors adding license info to binaries built by themselves, I think option 1 is better.
Agreed - I also prefer Option 1, but would prefer to use the source code as the source of truth rather than the binary.
FWIW - projects I'm involved in follows an identical pattern to what you're describing in your release workflow using the existing tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @wlynch, I think the new update will be what you want.
As we discussed initially, we can support both getting dependencies from the binary or from code + import path.
I've included another implementation to get all the dependencies and their modules using packages.Visit.
FWIW - projects I'm involved in follows an identical pattern to what you're describing in your release workflow using the existing tool.
Glad to know, that gives me more confidence of the UX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-P0 idea:
By the way, after thinking a bit more about the use-case for analyzing completely unknown go binaries.
I think it can be a good additional feature.
As discussed in #71 (comment), the current limitation is that we do not know the exact version of the main module. However, the limitations seem to be a good trade off.
e.g. consider this UX:
I use go-licenses to analyze random binaries taken from containers, go-licenses can get exact version of all the dependencies and get HEAD version of the main repo. The result is slightly inaccurate for the main repo, so we show a warning message that we do not know version of the main repo. In this case, we can just check HEAD version.
Despite the limitation, the overall result is good-enough, I have a very good idea of which licenses are used in the binary and only in rare cases will there be a mismatch. From what I heard, when a project changes license, most of the cases, it's switching from a permissive license to a more restrictive license, so analyzing HEAD version of the main repo is unlikely to give false negatives. False positives are easy to deal with, because people can just check them manually.
Thank you for the review! |
Hi @wlynch, I've addressed all the feedback except the following which I have some different opinions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes - looking good!
v2/gocli/go_binary.go
Outdated
if localModule.Dir == "" { | ||
return nil, fmt.Errorf("Module %v's local directory is empty. Did you run go mod download?", ref.Path) | ||
} | ||
if localModule.Version != ref.Version { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main module to build a binary is usually not versioned. How can we know what it is if outside of the go working dir?
I think you can pull this out from go version
, as long as the binary was built from a released module version. e.g.
# Install from github.com/google/go-licenses@latest
$ go version -m go-licenses
go-licenses: devel +b7a85e0003
path github.com/google/go-licenses
mod github.com/google/go-licenses v0.0.0-20210816172045-3099c18c36e1 h1:ZK63Yns/0Y8hE5y50WuSsfFWNPmpYDQ9tzh/J2vWV8c=
# Install from master
$ go version -m go-licenses
go-licenses: devel +b7a85e0003
path github.com/google/go-licenses
mod github.com/google/go-licenses (devel)
(+b7a85e0003
corresponds to the go tool version, not the module version, so we can't infer the source commit from there)
I don't know how we can infer version information from devel
sources. 😞 The answer might just be we can't for now and just throw an error for the time being telling people they need to go install @version
if they want it to work.
If we are in a working dir and found go modules version has a mismatch, a more serious problem is that most likely the main module's version is incorrect
Assuming main module == binary module here, I think this is expected. The binary module is going to be different from the local working directory module, unless the local working directory module is the module that produces the binary (but if that were the case, my expectation is you would run go-licenses on the source package itself rather than the binary). That's why I don't think we can join the modules here, not only because of version mismatches, but also different replacements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified packages.Load looks great! 😄
I'm still skeptical of the binary based approach - solely relying on source code seems like the better choice since we're dependent on the source's go.mod. We can't make guarantees about the binary matching the source, and documenting that this tool must only be ran with binaries that were built from the same source state as the current directory feels brittle and restrictive.
That said, I'm going to go ahead and merge this as-is. Since this is library-only and still in an experimental state (e.g. not hooked up into any CLI surface yet), we have flexibility to play around with this and try out ideas.
Thanks for your patience throughout the review. 🙏
Last thing is let's just squash these commits down for a clean commit message before merge. |
Awesome! Thank you for the flexibility, I am happy to contribute to this. Let me squash the commits : ) |
Hmm, it seems I cannot just reopen the PR after fixing my branch. I have to create a new PR for this. |
Recreated as #82 |
This is the first PR of #70