Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remotely-mountable layers for speeding up image distribution #644

Closed
wants to merge 2 commits into from

Conversation

ktock
Copy link
Contributor

@ktock ktock commented Jun 9, 2020

Related:

The following design description is stale. See #644 (comment) for the latest design of this patch.

Recently I read through the codes and came up with the strategy for enabling remote-snapshotter-like functionality in libpod. This is a discussion thread for the design of the low-level part of this functionality.

Creating layers without pulling the content means that Store.ApplyDiff API won't be called for this layer. So during the call for Store.CreateLayer API, we need to check the existence of the targeting layer content in the graphdriver. If the content exists we need to commit it immediately like done in ApplyDiff.

Creating layers without ApplyDiff

Corresponding to the commit (smaller one): d174476

To achieve this, storage.Store will need an additional calling convention with graphdrivers for asking if the targeting layer contents can be provided by graphdrivers (i.e. remotely-mountable). In this patch, I extended the semantics of Driver.Create API. As mentioned in another PR(...), additional information for that layer is passed from ImageDestination to Store.CreateLayer API using storage.LayerOptions.Labels option. Then Store.CreateLayer passes these labels to Driver.Create API using driver.CreateOpts.StorageOpt, for asking if the targeting layer is remotely-mountable.

The graphdrivers (e.g. CVMFS-graphdriver or stargz-graphdriver, etc) can search the remote mount points using passed StorageOpts. If the layer is remotely mountable, graphdrivers tell so to the store. In this patch, I introduced a typed error ErrTargetLayerAlreadyExist for this. When the store gets ErrTargetLayerAlreadyExist, it immediately registers this layer in the same way as done in Store.ApplyDiff API.

Working example of the graphdriver which supports remotely-mountable layers

Corresponding to the commit (bigger one): b139d87

As a working example, I implemented a stargz-snapshotter-based graphdriver and it's also included in this patch. This graphdriver leverages stargz snapshotter for mounting stargz layers and managing the raw contents and mountpoints. You can refer to the snapshotter implementation on https://github.com/ktock/stargz-snapshotter/tree/graphdriver.

TODOs

  • discussing the design of "creating layers without ApplyDiff" towards an agreement.
  • discussing the graphdriver implementation
  • adding more tests

@rhatdan
Copy link
Member

rhatdan commented Jun 9, 2020

@giuseppe @mtrmac @nalind @vrothberg PTAL

@ktock ktock force-pushed the remote branch 4 times, most recently from 0f967c2 to f8dea83 Compare June 17, 2020 02:12
@ktock
Copy link
Contributor Author

ktock commented Jun 17, 2020

Based on the discussion in containers/podman#4739, I rethought the design of the remote snapshot functionality to leverage additional layer stores. What I've done here is to add a "layer discovery" functionality for the store.

Layer discovery for additional layer store

Corresponding to the commit (smaller one): 7e66198

For some filesystems including stargz-based one, recognizing all available layers and storing the exhaustive list of *store.Layer in the additional store in advance is difficult. We need something like "layer discovery" functionality here, which allows clients (e.g. *storageImageSource.TryReusingBlob) to tell the store which layers they want, with some additional information (e.g. layer digest, diffID, image reference, etc). This allows the store to discover the specified layers from remote stores and to add the corresponding *store.Layer information to the list in the additional layer store. Then the later calls to the store APIs can recognize these layers.

I've made use of store.CreateLayer API for achieving this. Clients (e.g. *storageImageSource.TryReusingBlob) can call this API with targetting layer's information (e.g. layer digest, diffID, image reference, etc) as labels. These labels will be passed to the underlying graphdriver through StorageOpts. Then this graphdriver discovers the specified layer from the remote store. If that layer exists, the graphdriver adds the corresponding layer information (*store.Layer) to the additional layer store then returns ErrTargetLayerAlreadyExist. Clients can use this layer in the later API calls and of course, can skip the pull of this layer.

Working example of the graphdriver which supports remotely-mountable layers

Corresponding to the commit (bigger one): f8dea83

As a working example, I implemented a stargz-snapshotter-based graphdriver and it's also included in this patch. This graphdriver leverages stargz snapshotter for mounting stargz layers and managing the raw contents and mountpoints. You can refer to the snapshotter implementation on https://github.com/ktock/stargz-snapshotter/tree/graphdriver.

@rhatdan
Copy link
Member

rhatdan commented Jun 24, 2020

@ktock Could you fix the lint issues. Most of the engineering team has been tied up in working on Podman 2.0 we hope to get back to this soon.

@ktock
Copy link
Contributor Author

ktock commented Jun 25, 2020

Thanks for your time. It seems to be Go version's issue. Do we have plan to move to a newer version of go(1.13)?

could not import github.com/containerd/containerd/errdefs (/go/src/github.com/containers/storage/vendor/github.com/containerd/containerd/errdefs/errors.go:54:16: Is not declared by package errors

@rhatdan
Copy link
Member

rhatdan commented Jun 25, 2020

Yes, it is time we moved to golang 1.13

@ktock
Copy link
Contributor Author

ktock commented Jun 29, 2020

The completions of following tests haven't been reported to this PR page but tests (except for devicemapper ones) seem to have passed 🤔

https://cirrus-ci.com/build/5193639385104384

- TEST_DRIVER:fuse-overlay VM_IMAGE:${FEDORA_CACHE_IMAGE_NAME} image:fedora-32-libpod-6508632441356288
- TEST_DRIVER:fuse-overlay VM_IMAGE:${PRIOR_FEDORA_CACHE_IMAGE_NAME} image:fedora-31-libpod-6508632441356288

I think we can at first focus on the layer discovery part (commit 56035c0), then can proceed to the graphdriver implementation part (commit f98ab81).
I included these two parts in this single PR for the ease of the discussion based on the working example but we can consider to separate the PR and have a dedicated one for the second part after the first one reached an agreement.

@giuseppe giuseppe self-requested a review June 30, 2020 10:48
@@ -0,0 +1,993 @@
// +build linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any code coming from containerd and we need to maintain the copyright header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.
And before I waste your time...

I don't intend to get graphdriver part (f98ab81) merged into libpod immediately. At this moment, it's just an example implementation and it's incomplete at some points,

  • f98ab81 relies on stargz-snapshotter but it will require users to run additional daemons (=containerd+stargz-snapshotter) and it might not fit to the podman's design phirosophy?
    • we might need a daemonless implementation of this patch.
  • stargz-snapshotter is untested in the rootless world. (so the patch works only with root user)
  • some implementation details are imcomplete yet (e.g. the way to treat id mappings, lockfiles, ...)

The patch f98ab81 intends to make sure that "layer discovery" functionality (56035c0) is needed to make remote-snapshotter-like functionality come true in podman.
I think the deeper review (or design discussion) of the graphdriver implementation can be after the "layer discovery" part (56035c0) is done (But it's kinda chicken and egg problem...).

Please tell me if I should separate the PR.

@rhatdan
Copy link
Member

rhatdan commented Aug 3, 2020

@giuseppe @ktock Any movement on this PR?

@giuseppe
Copy link
Member

giuseppe commented Aug 4, 2020

@ktock I wonder if this would be easier to implement adding a mechanism similar to additionalStores.

Instead of expecting the store it is just the path to the exploded layers, as:

additional_remote_paths = ["/foo/bar/"] and expect the layer to be accessible at /foo/bar/$ID.

Or better, we could have it configurable and define a few substitution, e.g. additional_remote_paths = ["/foo/bar/%L"], additional_remote_paths = ["/foo/bar/%I/%L"] wher %I and %L are the checksums for the image and the layer.

Do you see any problem with this mechanism and CRFS?

@ktock
Copy link
Contributor Author

ktock commented Aug 25, 2020

@giuseppe Querying layers to registries requires at least the image reference name (e.g. docker.io/library/ubuntu) and the layer digest. So /foo/bar/%I/%L seems fine for querying a layer on the registry but %I should be the image reference name (or its base64 encoded string, etc) rather than checksum or hash of the image.

(Maybe in following-up patches => ) Furthermore, we'll need a verification functionality for each lazily-fetched chunk (stargz-snapshotter is starting to support it). The OCI digest of a layer is required for querying it on the registry but can't be used for verifying each lazily-fetched file data so we'll need to pass additional information (e.g. the OCI digest of stargz TOC contained in the manifest as a layer annotation) to the underlying filesystem, e.g. /foo/bar/%I/%L/%T where %I = (base64 encoded) image ref, %L = layer digest and %T = the digest of stargz TOC. Or we should have a more flexible way of making the filepath format configurable.

@siscia
Copy link

siscia commented Oct 8, 2020

Why base64 encoded? To avoid slashes in the name?

In our experience the complex part is not really having the layers accessible, but to create all the accessory file system structure next to it, like the layer.json file and the whole directory structure necessary.

If we could get along with just providing the layers in the filesystem it would be great!

Info gather by our GSoC student: https://medium.com/@mohit2501tyagi/lets-walk-through-podman-37636cf223c5

@ktock
Copy link
Contributor Author

ktock commented Dec 26, 2020

Closing in favor of containers/podman#8837, #795 and containers/image#1109 .

@ktock ktock closed this Dec 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants