Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remote-test option for E2E #876

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ArangoGutierrez
Copy link
Collaborator

No description provided.

@ArangoGutierrez ArangoGutierrez self-assigned this Jan 23, 2025
@ArangoGutierrez ArangoGutierrez added the testing issue/PR to fix/edit/create/enhance a project unit/e2e test label Jan 23, 2025
@ArangoGutierrez
Copy link
Collaborator Author

Next step is to have this to run during PR's, merge events and Tag cuts

return localRunScript(script)
}

func localRunScript(script string) (string, error) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if I should move everything from this line forward to the utils.go file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could move all the script specifics to a separate file, but not utils.go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been more clear. I think we should separate logic according to functionality / domain and name the files accordingly. Using tools.go as a catch-all does not achieve this.

set -xe
`

var dockerInstallTemplate = `
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that in the future, we will have different methods of installation to test, for example, CDI vs non-CDI.


: ${IMAGE:={{.Image}}}

sudo ln -s /var/run/nvidia-container-toolkit/toolkit/nvidia-container-runtime-hook /usr/bin/nvidia-container-runtime-hook
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add a comment as to why this is required?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, more documentation never hurts

.PHONY: test
test:
E2E_IMAGE_NAME ?= ghcr.io/nvidia/container-toolkit
E2E_IMAGE_TAG ?= latest
Copy link
Member

@elezar elezar Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest is never a valid tag in this repo. Can we rather error out if this isn't set for the remote case.

E2E_SSH_HOST ?=

.PHONY: local-test remote-test
# Local test assumes that the container-toolkit is already installed on the host
Copy link
Member

@elezar elezar Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the distinction? Even when running locally, I may want to install the toolkit from an image that I've just built.

Comment on lines 39 to 40
imageRepo string
imageTag string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this warrant a type? (same for the ssh info below)?

var _ = Describe("docker", Ordered, func() {
// Install the NVIDIA Container Toolkit
BeforeAll(func(ctx context.Context) {
if sshKey != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned, we should probably allow the isntallation logic to be run locally too.

// Install the NVIDIA Container Toolkit
BeforeAll(func(ctx context.Context) {
if sshKey != "" {
installScript, err := getInstallScript(dockerInstallTemplate, fmt.Sprintf("%s:%s", imageRepo, imageTag))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to return an Installer instead which implements Install() error?

--restart-mode=systemd
`

type DockerInstall struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type isn't used.

"text/template"
)

const Shebang = `#! /usr/bin/env bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not more confusing to have to remember to add the Shebang each time? Should we not use include this in the template / script.

docker run --pid=host --rm -i --privileged \
-v /:/host \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /var/run/nvidia-container-toolkit:/var/run/nvidia-container-toolkit \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a temporary path here since we ideally want to remove the config again after the tests.

Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
Comment on lines +24 to +27
E2E_IMAGE_TAG ?=
ifeq ($(E2E_IMAGE_TAG),)
$(error E2E_IMAGE_TAG is not set)
endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not make sense to use the tag that would be generated when running:

make -f deployments/container/Makefile build-ubuntu20.04

in the root by default.

i.e:

        --tag nvidia/container-toolkit:1.17.3-ubuntu20.04 \

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context, tag refers to the image tag in your example being 1.17.3-ubuntu20.04.
Do we want to have tag refer to the image name+tag for this context? (I know some projects do this, so I am ok with either)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just coppied a snippet from the shell output when running the make command. Do we think the following is a reasonable mode of operation:

  1. Run ./scripts/build-packages.sh ubuntu18.04-amd64 to build the packages.
  2. Run make -f deployment/container/Makefile build to build the required image.
  3. Run make -f tests/e2e/Makefile test to run the tests for the built version locally?

--restart-mode=systemd
`

type Installer struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move installer to installer.go.

Comment on lines +64 to +66
SshKey string
SshUser string
RemoteHost string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should not be installer members. The installer doesn't care if we're running remotely or locally. It should have a reference to a Runner that it uses to run the required commands.


func (i *Installer) Install() error {
// Parse the combined template
tmpl, err := template.New("dockerScript").Parse(i.Template)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The template name need not be dockerScript.

@@ -24,7 +24,27 @@ import (
)

// Integration tests for Docker runtime
var _ = Describe("docker", func() {
var _ = Describe("docker", Ordered, func() {
var s Script
Copy link
Member

@elezar elezar Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a r Runner and not a s Script.

Comment on lines +32 to +45
switch {
case sshKey == "":
s = localRun{}
default:
s = remoteRun{sshKey, sshUser, remoteHost}
}

if installCTK {
installer, err := newInstaller(dockerInstallTemplate, imageRepo+":"+imageTag, sshKey, sshUser, remoteHost)
Expect(err).ToNot(HaveOccurred())

err = installer.Install()
Expect(err).ToNot(HaveOccurred())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic on which Script (or Runner) should be used should be pulled into a constructor with functional arguments. This instance should also be reused in the Installer so as to not duplicate the logic there.

Suggested change
switch {
case sshKey == "":
s = localRun{}
default:
s = remoteRun{sshKey, sshUser, remoteHost}
}
if installCTK {
installer, err := newInstaller(dockerInstallTemplate, imageRepo+":"+imageTag, sshKey, sshUser, remoteHost)
Expect(err).ToNot(HaveOccurred())
err = installer.Install()
Expect(err).ToNot(HaveOccurred())
}
s := NewRunner(
WithHost(remoteHost),
WithSshKey(sshKey),
WithSshUser(sshUser),
)
if installCTK {
installer, err := NewToolkitInstaller(
WithRunner(s),
WithVersion(version),
)
Expect(err).ToNot(HaveOccurred())
err = installer.Install()
Expect(err).ToNot(HaveOccurred())
}

return err
}

type Script interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this inteface only has a single function Run the idiomatic Go name for it would be Runner. Let's also move it to a separate file.

Run(script string) (string, error)
}

type localRun struct{}
Copy link
Member

@elezar elezar Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type localRun struct{}
// A localRunner runs scripts locally.
type localRunner struct{}

}

type localRun struct{}
type remoteRun struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type remoteRun struct {
// A remoteRunner runs scripts over SSH.
type remoteRunner struct {

(also wondering whether sshRunner isn't more accurate).

}

type Script interface {
Run(script string) (string, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering about this. Would we expect this to return (string, string, error) instead so that we can always access STDOUT and STDERR?


connectionFailed := false
for i := 0; i < 20; i++ {
client, err = ssh.Dial("tcp", remoteHost+":22", sshConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Does it make sense to make the port configurable?


func (r remoteRun) Run(script string) (string, error) {
// Create a new SSH connection
client, err := connectOrDie(r.SshKey, r.SshUser, r.RemoteHost)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create a new SSH session every time we Run a script?

}

connectionFailed := false
for i := 0; i < 20; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a retry at this stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing issue/PR to fix/edit/create/enhance a project unit/e2e test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants