Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remotely-mountable layers for speeding up image distribution #956

Closed
wants to merge 1 commit into from

Conversation

ktock
Copy link
Contributor

@ktock ktock commented Jun 9, 2020

Related:

The following design description is stale. See #956 (comment) for the latest design of this patch.

Recently I read through the codes and came up with the strategy for enabling remote-snapshotter-like functionality in libpod.

The low-level basic idea is that graphdriver manages "remotely-mountable" layers as well as the contents pulled by libpod. When it comes to higher-level components including containers/image, some of the key factors should be:

  • how to skip pulling remotely-mountable layers?, and
  • how to pass some additional layer information (e.g. image reference, etc) to graphdrivers for help them to search remote contents?

The following is the big pictures of the design. I don't think it's perfect and we might need further discussion so please tell me anything I'm missing. This is also based on the stargz requirements so I would like to get opinions from CVMFS people as well.

For lower-level part of the design, please refer to containers/storage#644.

Propagating some additional layer information

Including stargz, remotely mounting layers will require some information including image reference, in addition to the layer digest. My current design enables this using BlobInfo.Annotations which is passed from copy.Image API to the ImageDestination. copy.Image appends some image-related information (e.g. image reference, etc.) to the targeting layer's BlobInfo.Annotations and pass it to the ImageDestination.TryReusingBlob API for asking if this can skip the layer download.

Checking if the layer download is skippable by talking with storage.Store

We are now focusing on remote mountpoint management based on graphdriver so one of the implementations of ImageDestination we need to focus on here is storageImageDestination which is based on containers/storage. storageImageDestination.TryReusingBlob checks if the targeting layer is being stored in the backing store. This patch extends this for remotely-mountable layers using Store.CreateLayer API. I added a new optional field storage.LayerOptions.Labels. During checking the layer existence, storageImageDestination calls Store.CreateLayer API for asking the existence of the targeting layer ((id, parent) = (target layer digest, "")) with Labels option which contains the information passed from copy.Image through BlobInfo.Annotations.

Store implementation can use these lables for searching the targeting layer. If it exists, the store tells storageImageDestination to skip downloading this layer. In my current implementation, I introduced a typed error ErrTargetLayerAlreadyExist for this.

Committing the layer chain without diffing by talking with storage.Store

Calling Store.Diff API is another thing we want to avoid for remotely-mountable layers because it possibly cause downloading the whole blob in the store.

When committing the layers chain in storageImageDestination.Commit API, it generally gets the targeting layer contents by calling Store.Diff API towards (id, parent) = (target layer digest, ""), which we want to avoid for reomtely-mountable layeres. This patch enables storageImageDestination to omit the diffing by leveraging the same semantics of Store.CreateLayer API as mentioned in the above. storageImageDestination calls Store.CreateLayer with arg (id, parent) = (chain id, the parent layer) with the annotations(labels) used during TryReusingBlob. Then this expects the backing store internally overlays the remotely mounted layers. If it succeeds, the store tells storageImageDestination to skip diffing this layer. Even if the overlaying fails, we can fallback to the normal steps (diffing and applying) because in TryReusingBlobs it's been made sure that the chain (id, parent) = (target layer digest, "") exists in the backing store.

TODOs

  • discussing the design of "layer information propagation" and "skipping functionality" towards an agreement.
  • discussing the graphdriver implementation
  • adding more tests

@ktock
Copy link
Contributor Author

ktock commented Jun 17, 2020

Based on the discussion in containers/podman#4739, I rethought the design of the remote snapshot functionality to leverage additional layer stores. What I've done here is to add a "layer discovery" functionality for the store.

This patch levelages "layer discovery" functionality implemented in containers/storage#644 (see the PR for more details) along with additioanl layer store, for effectively skipping layer downloads during copy.Image.

This patch adds layer discoveries in two places where the check of layer existence is done.

  • *storageImageCloser.TryReusingBlob: checks the existence of the blob content of the targetting layer
  • *storageImageCloser.Commit: checks the existence of chained view of the targetting layer

These discoveries allow the underlying store to search the layer contents from remote stores and add the layer information to its additional layer store so we can skip the downloading (or diffing) these layers.

@rhatdan
Copy link
Member

rhatdan commented Jul 1, 2020

@mtrmac @giuseppe @nalind @vrothberg PTAL

@ktock
Copy link
Contributor Author

ktock commented Dec 26, 2020

Closing in favor of containers/podman#8837, containers/storage#795 and #1109 .

@ktock ktock closed this Dec 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants