Support remotely-mountable layers for speeding up image distribution #956

ktock · 2020-06-09T08:12:41Z

Recently I read through the codes and came up with the strategy for enabling remote-snapshotter-like functionality in libpod.

The low-level basic idea is that graphdriver manages "remotely-mountable" layers as well as the contents pulled by libpod. When it comes to higher-level components including containers/image, some of the key factors should be:

how to skip pulling remotely-mountable layers?, and
how to pass some additional layer information (e.g. image reference, etc) to graphdrivers for help them to search remote contents?

The following is the big pictures of the design. I don't think it's perfect and we might need further discussion so please tell me anything I'm missing. This is also based on the stargz requirements so I would like to get opinions from CVMFS people as well.

For lower-level part of the design, please refer to containers/storage#644.

Propagating some additional layer information

Including stargz, remotely mounting layers will require some information including image reference, in addition to the layer digest. My current design enables this using BlobInfo.Annotations which is passed from copy.Image API to the ImageDestination. copy.Image appends some image-related information (e.g. image reference, etc.) to the targeting layer's BlobInfo.Annotations and pass it to the ImageDestination.TryReusingBlob API for asking if this can skip the layer download.

Checking if the layer download is skippable by talking with `storage.Store`

We are now focusing on remote mountpoint management based on graphdriver so one of the implementations of ImageDestination we need to focus on here is storageImageDestination which is based on containers/storage. storageImageDestination.TryReusingBlob checks if the targeting layer is being stored in the backing store. This patch extends this for remotely-mountable layers using Store.CreateLayer API. I added a new optional field storage.LayerOptions.Labels. During checking the layer existence, storageImageDestination calls Store.CreateLayer API for asking the existence of the targeting layer ((id, parent) = (target layer digest, "")) with Labels option which contains the information passed from copy.Image through BlobInfo.Annotations.

Store implementation can use these lables for searching the targeting layer. If it exists, the store tells storageImageDestination to skip downloading this layer. In my current implementation, I introduced a typed error ErrTargetLayerAlreadyExist for this.

Committing the layer chain without diffing by talking with `storage.Store`

Calling Store.Diff API is another thing we want to avoid for remotely-mountable layers because it possibly cause downloading the whole blob in the store.

When committing the layers chain in storageImageDestination.Commit API, it generally gets the targeting layer contents by calling Store.Diff API towards (id, parent) = (target layer digest, ""), which we want to avoid for reomtely-mountable layeres. This patch enables storageImageDestination to omit the diffing by leveraging the same semantics of Store.CreateLayer API as mentioned in the above. storageImageDestination calls Store.CreateLayer with arg (id, parent) = (chain id, the parent layer) with the annotations(labels) used during TryReusingBlob. Then this expects the backing store internally overlays the remotely mounted layers. If it succeeds, the store tells storageImageDestination to skip diffing this layer. Even if the overlaying fails, we can fallback to the normal steps (diffing and applying) because in TryReusingBlobs it's been made sure that the chain (id, parent) = (target layer digest, "") exists in the backing store.

TODOs

discussing the design of "layer information propagation" and "skipping functionality" towards an agreement.
discussing the graphdriver implementation
adding more tests

ktock · 2020-06-17T02:30:14Z

Based on the discussion in containers/podman#4739, I rethought the design of the remote snapshot functionality to leverage additional layer stores. What I've done here is to add a "layer discovery" functionality for the store.

This patch levelages "layer discovery" functionality implemented in containers/storage#644 (see the PR for more details) along with additioanl layer store, for effectively skipping layer downloads during copy.Image.

This patch adds layer discoveries in two places where the check of layer existence is done.

*storageImageCloser.TryReusingBlob: checks the existence of the blob content of the targetting layer
*storageImageCloser.Commit: checks the existence of chained view of the targetting layer

These discoveries allow the underlying store to search the layer contents from remote stores and add the layer information to its additional layer store so we can skip the downloading (or diffing) these layers.

Signed-off-by: Kohei Tokunaga <[email protected]>

rhatdan · 2020-07-01T12:58:46Z

@mtrmac @giuseppe @nalind @vrothberg PTAL

ktock · 2020-12-26T15:59:57Z

Closing in favor of containers/podman#8837, containers/storage#795 and #1109 .

This was referenced Jun 9, 2020

Support remotely-mountable layers for speeding up image distribution containers/storage#644

Closed

Remote snapshotter in podman containers/podman#4739

Closed

ktock force-pushed the remote branch 2 times, most recently from 76fa5bf to 04a435f Compare June 17, 2020 02:22

Skip pulling layers which can be provided from the backing store

2d16660

Signed-off-by: Kohei Tokunaga <[email protected]>

ktock force-pushed the remote branch from 04a435f to 2d16660 Compare June 17, 2020 02:46

ktock closed this Dec 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support remotely-mountable layers for speeding up image distribution #956

Support remotely-mountable layers for speeding up image distribution #956

ktock commented Jun 9, 2020 •

edited

Loading

ktock commented Jun 17, 2020

rhatdan commented Jul 1, 2020

ktock commented Dec 26, 2020

Support remotely-mountable layers for speeding up image distribution #956

Support remotely-mountable layers for speeding up image distribution #956

Conversation

ktock commented Jun 9, 2020 • edited Loading

The following design description is stale. See #956 (comment) for the latest design of this patch.

Propagating some additional layer information

Checking if the layer download is skippable by talking with storage.Store

Committing the layer chain without diffing by talking with storage.Store

TODOs

ktock commented Jun 17, 2020

rhatdan commented Jul 1, 2020

ktock commented Dec 26, 2020

ktock commented Jun 9, 2020 •

edited

Loading

Checking if the layer download is skippable by talking with `storage.Store`

Committing the layer chain without diffing by talking with `storage.Store`