Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2: list dependencies from either import path or go binary #71

Closed
wants to merge 29 commits into from

Conversation

Bobgy
Copy link
Collaborator

@Bobgy Bobgy commented Jun 19, 2021

This is the first PR of #70

@Bobgy Bobgy changed the title v2: list deps in go binary v2(PR1): list deps in go binary Jun 19, 2021
@google-cla google-cla bot added the cla: yes Contributor license agreement signed (https://cla.developers.google.com) label Jun 19, 2021
@Bobgy Bobgy changed the title v2(PR1): list deps in go binary v2: list deps in go binary Jun 19, 2021
@Bobgy Bobgy changed the title v2: list deps in go binary v2-PR1: list deps in go binary Jun 19, 2021
@Bobgy
Copy link
Collaborator Author

Bobgy commented Jun 19, 2021

Hi @wlynch! Here's the first PR

@Bobgy Bobgy changed the title v2-PR1: list deps in go binary v2: list deps in go binary Jun 19, 2021
v2/deps/go_binary.go Outdated Show resolved Hide resolved
@Bobgy
Copy link
Collaborator Author

Bobgy commented Jun 20, 2021

Todo:

  • Rm v2/deps/main
  • Include third-party lichen module
  • Merge deps and goutils package into gocli package.

@Bobgy Bobgy changed the title v2: list deps in go binary v2: list modules in go binary Jun 20, 2021
@Bobgy
Copy link
Collaborator Author

Bobgy commented Jun 20, 2021

All the clean ups are done. Ready for code review : )

cc @wlynch @06kellyjac

@Bobgy
Copy link
Collaborator Author

Bobgy commented Jun 21, 2021

I'm not cleaning up the commits, because I always use squash merge when merging a PR.
Please let me know if this project has a different convention.

v2/gocli/go_binary.go Outdated Show resolved Hide resolved
v2/gocli/go_binary_test.go Outdated Show resolved Hide resolved
v2/gocli/go_binary_test.go Outdated Show resolved Hide resolved
v2/gocli/list.go Outdated Show resolved Hide resolved
v2/gocli/list.go Show resolved Hide resolved
v2/third_party/uw-labs/lichen/METADATA Outdated Show resolved Hide resolved
v2/gocli/list.go Outdated Show resolved Hide resolved
v2/gocli/go_binary.go Outdated Show resolved Hide resolved
v2/gocli/list.go Outdated Show resolved Hide resolved
v2/README.md Outdated Show resolved Hide resolved
@Bobgy
Copy link
Collaborator Author

Bobgy commented Jun 26, 2021

Hi @wlynch, thank you for the detailed review!
I can see a big quality improvement with your comments.
I've addressed all comments except #71 (comment) (which I want to defer to future work to keep the scope of this PR controllable.)

It's now ready for another round of reviews when you have time.

@Bobgy
Copy link
Collaborator Author

Bobgy commented Jul 31, 2021

@wlynch I tried using packages.Load instead, but I hit issue: #18

here's my implementation:

func loadModulesMetadata(refs []model.ModuleReference) (modules []packages.Module, err error) {
	modules = make([]packages.Module, 0)
	for _, ref := range refs {
		pkgs, err := packages.Load(&packages.Config{
			Mode: packages.NeedModule,
		}, ref.Path)
		if err != nil {
			return nil, err
		}
		if len(pkgs) != 1 {
			return nil, fmt.Errorf("got len(pkgs)=%v, expect 1", len(pkgs))
		}
		pkg := pkgs[0]
		if len(pkg.Errors) > 0 {
			log.Printf("got errs: %+v", pkg.Errors)
			continue
		}
		if pkg.Module == nil {
			log.Printf("got nil module: %+v, path=%s", pkg, pkg.PkgPath)
			continue
		}
		modules = append(modules, *pkg.Module)
	}
	return modules, nil
}

And I got the error when running unit test for cli02 test module:

2021/07/31 12:25:59 got errs: [-: no required module provides package golang.org/x/sys; to add it:
	go get golang.org/x/sys]
--- FAIL: TestListModulesInGoBinary (2.12s)
    --- FAIL: TestListModulesInGoBinary/../tests/modules/cli02 (2.00s)
        /home/gongyuan_kubeflow_org/github/google/go-licenses/v2/gocli/go_binary_test.go:91: 
            	Error Trace:	go_binary_test.go:91
            	Error:      	Not equal: 
            	            	expected: []string{"github.com/fsnotify/fsnotify", "github.com/hashicorp/hcl", "github.com/magiconair/properties", "github.com/mitchellh/go-homedir", "github.com/mitchellh/mapstructure", "github.com/pelletier/go-toml", "github.com/spf13/afero", "github.com/spf13/cast", "github.com/spf13/cobra", "github.com/spf13/jwalterweatherman", "github.com/spf13/pflag", "github.com/spf13/viper", "github.com/subosito/gotenv", "golang.org/x/sys", "golang.org/x/text", "gopkg.in/ini.v1", "gopkg.in/yaml.v2"}
            	            	actual  : []string{"github.com/fsnotify/fsnotify", "github.com/hashicorp/hcl", "github.com/magiconair/properties", "github.com/mitchellh/go-homedir", "github.com/mitchellh/mapstructure", "github.com/pelletier/go-toml", "github.com/spf13/afero", "github.com/spf13/cast", "github.com/spf13/cobra", "github.com/spf13/jwalterweatherman", "github.com/spf13/pflag", "github.com/spf13/viper", "github.com/subosito/gotenv", "golang.org/x/text", "gopkg.in/ini.v1", "gopkg.in/yaml.v2"}
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1,2 +1,2 @@
            	            	-([]string) (len=17) {
            	            	+([]string) (len=16) {
            	            	  (string) (len=28) "github.com/fsnotify/fsnotify",
            	            	@@ -14,3 +14,2 @@
            	            	  (string) (len=26) "github.com/subosito/gotenv",
            	            	- (string) (len=16) "golang.org/x/sys",
            	            	  (string) (len=17) "golang.org/x/text",
            	Test:       	TestListModulesInGoBinary/../tests/modules/cli02

I think the root cause is that golang.org/x/sys is a module, but it is not a package.
Refer to its source code: https://cs.opensource.google/go/x/sys, the folder only contains folders, but no go code.

So we cannot load the package golang.org/x/sys.

@Bobgy
Copy link
Collaborator Author

Bobgy commented Jul 31, 2021

Therefore, I think I'd prefer staying with existing implementation for loading module metadata.

@wlynch
Copy link
Contributor

wlynch commented Aug 2, 2021

Therefore, I think I'd prefer staying with existing implementation for loading module metadata.

That's annoying. :(
Hopefully golang/go#39301 will help out here.

Using CLI based impl in the meantime SGTM!

v2/gocli/go_binary.go Show resolved Hide resolved
if localModule.Dir == "" {
return nil, fmt.Errorf("Module %v's local directory is empty. Did you run go mod download?", ref.Path)
}
if localModule.Version != ref.Version {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes cause problems if the current module context doesn't match the same version as the binary. I think we'll need to rely on go list -m (or some library equivalent if available) instead.

To test this I used a modified cli02 binary using golang.org/x/[email protected], with a go-licenses@v2 using golang.org/x/[email protected] (diff)

$ go test ./...
--- FAIL: TestListModulesInGoBinary (0.28s)
    --- FAIL: TestListModulesInGoBinary/../tests/modules/cli02 (0.13s)
        go_binary_test.go:81: Found golang.org/x/tools v0.1.3 in go binary, but v0.1.4 is downloaded in go modules. Are you running this tool from the working dir to build the binary you are analyzing?
FAIL
$ go list -json -m golang.org/x/[email protected]  # Even though last command failed, module is downloaded locally
{
 "Path": "golang.org/x/tools",
 "Version": "v0.1.3",
 "Time": "2021-06-09T21:40:20Z",
 "Dir": "/usr/local/google/home/wlynch/pkg/mod/golang.org/x/[email protected]",
 "GoMod": "/usr/local/google/home/wlynch/pkg/mod/cache/download/golang.org/x/tools/@v/v0.1.3.mod",
 "GoVersion": "1.17"
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I didn't know we could use go list -json -m <module@version> to also download the module and get metadata at the same time. And I just tried, it's not restricted by the current module dir. I can run the command anywhere to get the modules.
I think we are now very close to lift the restriction of go module dir, if we just go list -json -m <the-entire-list-of-modules> to get all the modules.

There's one remaining challenge to me, the main module to build a binary is usually not versioned. How can we know what it is if outside of the go working dir?

This causes cause problems if the current module context doesn't match the same version as the binary.

For clarification, it was expected behavior in my design. How do we know where the main module is and whether it is the correct version? If we are in a working dir and found go modules version has a mismatch, a more serious problem is that most likely the main module's version is incorrect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main module to build a binary is usually not versioned. How can we know what it is if outside of the go working dir?

I think you can pull this out from go version, as long as the binary was built from a released module version. e.g.

# Install from github.com/google/go-licenses@latest
$ go version -m go-licenses
go-licenses: devel +b7a85e0003
        path    github.com/google/go-licenses
        mod     github.com/google/go-licenses   v0.0.0-20210816172045-3099c18c36e1 h1:ZK63Yns/0Y8hE5y50WuSsfFWNPmpYDQ9tzh/J2vWV8c=
# Install from master
$ go version -m go-licenses
go-licenses: devel +b7a85e0003
        path    github.com/google/go-licenses
        mod     github.com/google/go-licenses   (devel)

(+b7a85e0003 corresponds to the go tool version, not the module version, so we can't infer the source commit from there)

I don't know how we can infer version information from devel sources. 😞 The answer might just be we can't for now and just throw an error for the time being telling people they need to go install @version if they want it to work.

If we are in a working dir and found go modules version has a mismatch, a more serious problem is that most likely the main module's version is incorrect

Assuming main module == binary module here, I think this is expected. The binary module is going to be different from the local working directory module, unless the local working directory module is the module that produces the binary (but if that were the case, my expectation is you would run go-licenses on the source package itself rather than the binary). That's why I don't think we can join the modules here, not only because of version mismatches, but also different replacements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binary module is going to be different from the local working directory module, unless the local working directory module is the module that produces the binary (but if that were the case, my expectation is you would run go-licenses on the source package itself rather than the binary)

My use-case is exactly running go-licenses in the same working directory that produces the binary, that's a prerequisite for running the tool.
The reason I chose to run the tool on the binary instead of source code is because some go module dependencies (test / tool dependencies) are trimmed when compiling to the binary, so we can check license only for the modules included in the binary.

When you said running go-licenses on the source package itself, I assume you are talking about using packages.Load like the same logic in go-licenses v1. I think both solutions work, if you have strong opinions, I can try to use packages.Load instead.

I personally feel that it's better to rely on the modules metainfo from built binary, because it's better for single source of truth -- the go compiler checks for all dependencies, compile to a binary and record the modules metainfo in the binary.
On the contrary, if we use packages.Load to get all the modules from a package, the go compiler will collect all dependencies once, and packages.Load will get all the modules for the second time. How do we guarantee the two are giving the same results? I think there are potential risks for:

  • version skew -- when go compiler has a different version from packages.Load
  • build tag / config skew -- go compiler uses different build tags / flags than what packages.Load is using
    If we read the modules meta info directly from the binary, these risks are removed, because only the go compiler decides what modules are pulled in.

I believe we are making a trade off here.

Option 1 - the tool only runs in the same workdir used to build the binary.
Option 2 - the tool only works for binaries built by go install module@version.

Considering the major user scenario should be module authors adding license info to binaries built by themselves, I think option 1 is better.

After some experimentation using the new tool in github.com/kubeflow/pipelines, my typical licensing management workflow is like the following:

  1. keep a copy of generated license csv file in the repo
  2. in presubmit test, besides running go unit tests, also run go-licenses csv and verify the license csv is up-to-date. When there's an error, the PR author should check licenses for updated dependencies and update the licenses csv.
  3. when releasing, go-licenses save the notices to the container image containing the binary

In all the scenarios, the person/CICD tool running go-licenses should be using it in exactly the same workdir as where the binary is built.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional context! I was under the assumption you wanted to include additional binary tools as part of the license output, or to generate a license summary from a standalone unrelated binary. IIUC, you want to use the binary metadata as a shortcut to generate a minimal license summary that only includes dependencies that would be included in binaries that you are distributing.

The reason I chose to run the tool on the binary instead of source code is because some go module dependencies (test / tool dependencies) are trimmed when compiling to the binary, so we can check license only for the modules included in the binary.

Test dependencies are already excluded unless Test: true is included in packages.Config (which is currently not enabled for go-licenses). Related: #62

Tool dependencies (assuming this is following https://github.com/golang/go/wiki/Modules#how-can-i-track-tool-dependencies-for-a-module) would be treated like any other dependency and only be present if they are included transitively.

I think this would cover most cases to generate a minimal license summary, but if you have any examples in your project that don't fit well let me know! Would happy to dive deeper here.

version skew -- when go compiler has a different version from packages.Load

I'm not particularly worried about go compiler vs packages.Load version skew. Go has a pretty good track record when it comes to backwards compatibility. packages.Load is used by other tools in golang.org/x/tools, so I would imagine/hope any breaking changes / incompatibilities would be noticed prior to major Go releases.

What I am more worried about is version skew between the code and the binary, since the binary could have been built from a different version than what is currently present in the repo / go.mod. I'd rather lean on the code as the source of truth as much as possible, since that's what go.mod is referring to.

build tag / config skew -- go compiler uses different build tags / flags than what packages.Load is using

Although go-licenses does not directly accept build flags on its own CLI, we do support build flags via GOFLAGS.
If we wanted, we could pass through flags via packages.Config.

Considering the major user scenario should be module authors adding license info to binaries built by themselves, I think option 1 is better.

Agreed - I also prefer Option 1, but would prefer to use the source code as the source of truth rather than the binary.

FWIW - projects I'm involved in follows an identical pattern to what you're describing in your release workflow using the existing tool.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @wlynch, I think the new update will be what you want.
As we discussed initially, we can support both getting dependencies from the binary or from code + import path.

I've included another implementation to get all the dependencies and their modules using packages.Visit.

FWIW - projects I'm involved in follows an identical pattern to what you're describing in your release workflow using the existing tool.

Glad to know, that gives me more confidence of the UX.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-P0 idea:

By the way, after thinking a bit more about the use-case for analyzing completely unknown go binaries.
I think it can be a good additional feature.
As discussed in #71 (comment), the current limitation is that we do not know the exact version of the main module. However, the limitations seem to be a good trade off.

e.g. consider this UX:

I use go-licenses to analyze random binaries taken from containers, go-licenses can get exact version of all the dependencies and get HEAD version of the main repo. The result is slightly inaccurate for the main repo, so we show a warning message that we do not know version of the main repo. In this case, we can just check HEAD version.
Despite the limitation, the overall result is good-enough, I have a very good idea of which licenses are used in the binary and only in rare cases will there be a mismatch. From what I heard, when a project changes license, most of the cases, it's switching from a permissive license to a more restrictive license, so analyzing HEAD version of the main repo is unlikely to give false negatives. False positives are easy to deal with, because people can just check them manually.

v2/gocli/go_binary_test.go Outdated Show resolved Hide resolved
v2/gocli/go_binary.go Outdated Show resolved Hide resolved
v2/third_party/uw-labs/lichen/buildinfo/parse.go Outdated Show resolved Hide resolved
v2/gocli/list.go Outdated Show resolved Hide resolved
@Bobgy
Copy link
Collaborator Author

Bobgy commented Aug 3, 2021

Thank you for the review!
I will update soon

@Bobgy
Copy link
Collaborator Author

Bobgy commented Aug 16, 2021

Hi @wlynch, I've addressed all the feedback except the following which I have some different opinions:

Copy link
Contributor

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes - looking good!

if localModule.Dir == "" {
return nil, fmt.Errorf("Module %v's local directory is empty. Did you run go mod download?", ref.Path)
}
if localModule.Version != ref.Version {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main module to build a binary is usually not versioned. How can we know what it is if outside of the go working dir?

I think you can pull this out from go version, as long as the binary was built from a released module version. e.g.

# Install from github.com/google/go-licenses@latest
$ go version -m go-licenses
go-licenses: devel +b7a85e0003
        path    github.com/google/go-licenses
        mod     github.com/google/go-licenses   v0.0.0-20210816172045-3099c18c36e1 h1:ZK63Yns/0Y8hE5y50WuSsfFWNPmpYDQ9tzh/J2vWV8c=
# Install from master
$ go version -m go-licenses
go-licenses: devel +b7a85e0003
        path    github.com/google/go-licenses
        mod     github.com/google/go-licenses   (devel)

(+b7a85e0003 corresponds to the go tool version, not the module version, so we can't infer the source commit from there)

I don't know how we can infer version information from devel sources. 😞 The answer might just be we can't for now and just throw an error for the time being telling people they need to go install @version if they want it to work.

If we are in a working dir and found go modules version has a mismatch, a more serious problem is that most likely the main module's version is incorrect

Assuming main module == binary module here, I think this is expected. The binary module is going to be different from the local working directory module, unless the local working directory module is the module that produces the binary (but if that were the case, my expectation is you would run go-licenses on the source package itself rather than the binary). That's why I don't think we can join the modules here, not only because of version mismatches, but also different replacements.

@Bobgy Bobgy changed the title v2: list modules in go binary v2: list dependencies from either import path or go binary Aug 30, 2021
Copy link
Contributor

@wlynch wlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified packages.Load looks great! 😄

I'm still skeptical of the binary based approach - solely relying on source code seems like the better choice since we're dependent on the source's go.mod. We can't make guarantees about the binary matching the source, and documenting that this tool must only be ran with binaries that were built from the same source state as the current directory feels brittle and restrictive.

That said, I'm going to go ahead and merge this as-is. Since this is library-only and still in an experimental state (e.g. not hooked up into any CLI surface yet), we have flexibility to play around with this and try out ideas.

Thanks for your patience throughout the review. 🙏

@wlynch
Copy link
Contributor

wlynch commented Sep 24, 2021

Last thing is let's just squash these commits down for a clean commit message before merge.

@Bobgy
Copy link
Collaborator Author

Bobgy commented Sep 25, 2021

Awesome! Thank you for the flexibility, I am happy to contribute to this.

Let me squash the commits : )

@Bobgy Bobgy closed this Sep 26, 2021
@Bobgy Bobgy deleted the pr1 branch September 26, 2021 04:38
@Bobgy
Copy link
Collaborator Author

Bobgy commented Sep 26, 2021

Hmm, it seems I cannot just reopen the PR after fixing my branch.
Blocked by error message "The branch was force-pushed or recreated".

I have to create a new PR for this.

@Bobgy
Copy link
Collaborator Author

Bobgy commented Sep 26, 2021

Recreated as #82

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Contributor license agreement signed (https://cla.developers.google.com)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants