Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ociruntime: replace tar extraction subprocess with in-process #7589

Merged
merged 4 commits into from
Dec 3, 2024

Conversation

sluongng
Copy link
Contributor

Address the TODO comment.

Copy link
Member

@bduffany bduffany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was holding off on this TODO because I wanted to make sure the performance was on-par. Any chance you could run some quick tests with larger images with several layers (on a fast network connection, clearing cache between runs) to make sure it's not significantly slower and CPU usage is not significantly higher?

@sluongng
Copy link
Contributor Author

I applied this patch on top of master

diff --git a/enterprise/server/remote_execution/containers/ociruntime/BUILD b/enterprise/server/remote_execution/containers/ociruntime/BUILD
index 5d70c3ab1d..4c3b386f12 100644
--- a/enterprise/server/remote_execution/containers/ociruntime/BUILD
+++ b/enterprise/server/remote_execution/containers/ociruntime/BUILD
@@ -48,6 +48,7 @@ go_test(
         "test.workload-isolation-type": "firecracker",
         "test.container-image": "docker://gcr.io/flame-public/net-tools@sha256:ac701954d2c522d0d2b5296323127cacaaf77627e69db848a8d6ecb53149d344",
         "test.EstimatedComputeUnits": "2",
+        "test.EstimatedFreeDiskBytes": "4.5GB",
     },
     # NOTE: if testing locally, use --test_sharding_strategy=disabled to work
     # around the networking package not supporting cross-process locks.
diff --git a/enterprise/server/remote_execution/containers/ociruntime/ociruntime_test.go b/enterprise/server/remote_execution/containers/ociruntime/ociruntime_test.go
index 4d307ad96d..278dd16af3 100644
--- a/enterprise/server/remote_execution/containers/ociruntime/ociruntime_test.go
+++ b/enterprise/server/remote_execution/containers/ociruntime/ociruntime_test.go
@@ -1119,3 +1119,33 @@ func hasMountPermissions(t *testing.T) bool {
        require.NoError(t, err, "unmount")
        return true
 }
+
+func TestPullImage(t *testing.T) {
+       for _, tc := range []struct {
+               name  string
+               image string
+       }{
+               {
+                       name:  "dockerhub_busybox",
+                       image: "busybox:latest",
+               },
+               {
+                       name:  "ghcr_nix",
+                       image: "ghcr.io/avdv/nix-build@sha256:5f731adacf7290352fed6c1960dfb56ec3fdb31a376d0f2170961fbc96944d50",
+               },
+               {
+                       name:  "executor_image",
+                       image: "gcr.io/flame-public/buildbuddy-executor-enterprise:latest",
+               },
+       } {
+               t.Run(tc.name, func(t *testing.T) {
+                       layerDir := t.TempDir()
+                       imgStore := ociruntime.NewImageStore(layerDir)
+
+                       ctx := context.Background()
+                       img, err := imgStore.Pull(ctx, tc.image, oci.Credentials{})
+                       require.NoError(t, err)
+                       require.NotNil(t, img)
+               })
+       }
+}

Then ran it with

bazel test --config=remote --test_tag_filters=+docker enterprise/server/remote_execution/containers/ociruntime/... --test_filter=TestPullImage/executor_image --runs_per_test=10 --test_sharding_strategy=disabled

Repeated the same test with this PR on top.

I think the Go approach could be slightly slower and higher memory vs pipe to tar. With this small sample size the Executions, I think they are negligible. I suspect if the Go app is saturated, gc will kick in a bit more eagerly though.

Copy link
Member

@bduffany bduffany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming the TestFileOwnership test passes after you rebase (we aren't passing --no-same-owner anymore)

@sluongng sluongng force-pushed the sluongng/ociruntime-replace-tar branch 4 times, most recently from 9a44e12 to 2528197 Compare November 12, 2024 18:47
@sluongng sluongng requested a review from bduffany November 13, 2024 08:21
@sluongng sluongng force-pushed the sluongng/ociruntime-replace-tar branch 7 times, most recently from 7166367 to f56bf91 Compare November 21, 2024 09:31
@sluongng sluongng force-pushed the sluongng/ociruntime-replace-tar branch from f56bf91 to e9c897e Compare November 21, 2024 16:08
@sluongng
Copy link
Contributor Author

I learned about https://github.com/google/safeopen/blob/main/safeopen_linux.go over the weekend as well as golang/go#67002 (comment) (coming to Go 1.24). But I think these could be added later in a different PR?

Thoughts on merging the current PR as-is @fmeum @bduffany ?

@sluongng sluongng merged commit 559a71d into master Dec 3, 2024
15 checks passed
@sluongng sluongng deleted the sluongng/ociruntime-replace-tar branch December 3, 2024 10:43
@sluongng
Copy link
Contributor Author

sluongng commented Dec 3, 2024

Also learned about https://pkg.go.dev/github.com/landlock-lsm/go-landlock/landlock today which could be useful if we want to do these in a subprocess in the future.

sluongng added a commit that referenced this pull request Dec 5, 2024
sluongng added a commit that referenced this pull request Dec 5, 2024
bduffany pushed a commit that referenced this pull request Dec 5, 2024
sluongng added a commit that referenced this pull request Dec 6, 2024
@tylerwilliams tylerwilliams mentioned this pull request Dec 10, 2024
sluongng added a commit that referenced this pull request Dec 12, 2024
sluongng added a commit that referenced this pull request Dec 13, 2024
This change include a few small changes inside:

- Reapply "ociruntime: replace tar extraction subprocess with in-process
(#7589)" (#8019)

- Add a basic image pull test and guard it with a separate, manual Bazel
test target to help verifying common images.

- Fix issues with the previous implementation.
  Ensure that parent directory is created for file tar entries.
  Ignore unknown tar entry types (i.e. devices).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants