-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add remote-test option for E2E #876
base: main
Are you sure you want to change the base?
Conversation
Next step is to have this to run during PR's, merge events and Tag cuts |
test/e2e/e2e_test.go
Outdated
return localRunScript(script) | ||
} | ||
|
||
func localRunScript(script string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if I should move everything from this line forward to the utils.go
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could move all the script specifics to a separate file, but not utils.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have been more clear. I think we should separate logic according to functionality / domain and name the files accordingly. Using tools.go
as a catch-all does not achieve this.
test/e2e/utils.go
Outdated
set -xe | ||
` | ||
|
||
var dockerInstallTemplate = ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is that in the future, we will have different methods of installation to test, for example, CDI vs non-CDI.
test/e2e/utils.go
Outdated
|
||
: ${IMAGE:={{.Image}}} | ||
|
||
sudo ln -s /var/run/nvidia-container-toolkit/toolkit/nvidia-container-runtime-hook /usr/bin/nvidia-container-runtime-hook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add a comment as to why this is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, more documentation never hurts
test/e2e/Makefile
Outdated
.PHONY: test | ||
test: | ||
E2E_IMAGE_NAME ?= ghcr.io/nvidia/container-toolkit | ||
E2E_IMAGE_TAG ?= latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest
is never a valid tag in this repo. Can we rather error out if this isn't set for the remote case.
test/e2e/Makefile
Outdated
E2E_SSH_HOST ?= | ||
|
||
.PHONY: local-test remote-test | ||
# Local test assumes that the container-toolkit is already installed on the host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the distinction? Even when running locally, I may want to install the toolkit from an image that I've just built.
test/e2e/e2e_test.go
Outdated
imageRepo string | ||
imageTag string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this warrant a type? (same for the ssh
info below)?
var _ = Describe("docker", Ordered, func() { | ||
// Install the NVIDIA Container Toolkit | ||
BeforeAll(func(ctx context.Context) { | ||
if sshKey != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned, we should probably allow the isntallation logic to be run locally too.
// Install the NVIDIA Container Toolkit | ||
BeforeAll(func(ctx context.Context) { | ||
if sshKey != "" { | ||
installScript, err := getInstallScript(dockerInstallTemplate, fmt.Sprintf("%s:%s", imageRepo, imageTag)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to return an Installer
instead which implements Install() error
?
test/e2e/utils.go
Outdated
--restart-mode=systemd | ||
` | ||
|
||
type DockerInstall struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type isn't used.
test/e2e/utils.go
Outdated
"text/template" | ||
) | ||
|
||
const Shebang = `#! /usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not more confusing to have to remember to add the Shebang each time? Should we not use include this in the template / script.
test/e2e/utils.go
Outdated
docker run --pid=host --rm -i --privileged \ | ||
-v /:/host \ | ||
-v /var/run/docker.sock:/var/run/docker.sock \ | ||
-v /var/run/nvidia-container-toolkit:/var/run/nvidia-container-toolkit \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use a temporary path here since we ideally want to remove the config again after the tests.
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
58379b2
to
f2a0b50
Compare
E2E_IMAGE_TAG ?= | ||
ifeq ($(E2E_IMAGE_TAG),) | ||
$(error E2E_IMAGE_TAG is not set) | ||
endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not make sense to use the tag that would be generated when running:
make -f deployments/container/Makefile build-ubuntu20.04
in the root by default.
i.e:
--tag nvidia/container-toolkit:1.17.3-ubuntu20.04 \
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this context, tag
refers to the image tag in your example being 1.17.3-ubuntu20.04
.
Do we want to have tag
refer to the image name+tag for this context? (I know some projects do this, so I am ok with either)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just coppied a snippet from the shell output when running the make command. Do we think the following is a reasonable mode of operation:
- Run
./scripts/build-packages.sh ubuntu18.04-amd64
to build the packages. - Run
make -f deployment/container/Makefile build
to build the required image. - Run
make -f tests/e2e/Makefile test
to run the tests for the built version locally?
--restart-mode=systemd | ||
` | ||
|
||
type Installer struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move installer to installer.go
.
SshKey string | ||
SshUser string | ||
RemoteHost string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should not be installer
members. The installer doesn't care if we're running remotely or locally. It should have a reference to a Runner
that it uses to run the required commands.
|
||
func (i *Installer) Install() error { | ||
// Parse the combined template | ||
tmpl, err := template.New("dockerScript").Parse(i.Template) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The template name need not be dockerScript
.
@@ -24,7 +24,27 @@ import ( | |||
) | |||
|
|||
// Integration tests for Docker runtime | |||
var _ = Describe("docker", func() { | |||
var _ = Describe("docker", Ordered, func() { | |||
var s Script |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be a r Runner
and not a s Script
.
switch { | ||
case sshKey == "": | ||
s = localRun{} | ||
default: | ||
s = remoteRun{sshKey, sshUser, remoteHost} | ||
} | ||
|
||
if installCTK { | ||
installer, err := newInstaller(dockerInstallTemplate, imageRepo+":"+imageTag, sshKey, sshUser, remoteHost) | ||
Expect(err).ToNot(HaveOccurred()) | ||
|
||
err = installer.Install() | ||
Expect(err).ToNot(HaveOccurred()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic on which Script
(or Runner
) should be used should be pulled into a constructor with functional arguments. This instance should also be reused in the Installer
so as to not duplicate the logic there.
switch { | |
case sshKey == "": | |
s = localRun{} | |
default: | |
s = remoteRun{sshKey, sshUser, remoteHost} | |
} | |
if installCTK { | |
installer, err := newInstaller(dockerInstallTemplate, imageRepo+":"+imageTag, sshKey, sshUser, remoteHost) | |
Expect(err).ToNot(HaveOccurred()) | |
err = installer.Install() | |
Expect(err).ToNot(HaveOccurred()) | |
} | |
s := NewRunner( | |
WithHost(remoteHost), | |
WithSshKey(sshKey), | |
WithSshUser(sshUser), | |
) | |
if installCTK { | |
installer, err := NewToolkitInstaller( | |
WithRunner(s), | |
WithVersion(version), | |
) | |
Expect(err).ToNot(HaveOccurred()) | |
err = installer.Install() | |
Expect(err).ToNot(HaveOccurred()) | |
} |
return err | ||
} | ||
|
||
type Script interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this inteface only has a single function Run
the idiomatic Go name for it would be Runner
. Let's also move it to a separate file.
Run(script string) (string, error) | ||
} | ||
|
||
type localRun struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type localRun struct{} | |
// A localRunner runs scripts locally. | |
type localRunner struct{} |
} | ||
|
||
type localRun struct{} | ||
type remoteRun struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type remoteRun struct { | |
// A remoteRunner runs scripts over SSH. | |
type remoteRunner struct { |
(also wondering whether sshRunner
isn't more accurate).
} | ||
|
||
type Script interface { | ||
Run(script string) (string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering about this. Would we expect this to return (string, string, error)
instead so that we can always access STDOUT
and STDERR
?
|
||
connectionFailed := false | ||
for i := 0; i < 20; i++ { | ||
client, err = ssh.Dial("tcp", remoteHost+":22", sshConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Does it make sense to make the port configurable?
|
||
func (r remoteRun) Run(script string) (string, error) { | ||
// Create a new SSH connection | ||
client, err := connectOrDie(r.SshKey, r.SshUser, r.RemoteHost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create a new SSH session every time we Run
a script?
} | ||
|
||
connectionFailed := false | ||
for i := 0; i < 20; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a retry at this stage.
No description provided.