-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support additional layer store (patch for containers/storage) #795
Conversation
drivers/overlay/overlay.go
Outdated
metadata := make(map[string]string) | ||
found := false | ||
if additionalLayer := d.additionalLayer(id); additionalLayer != "" { | ||
metadata = make(map[string]string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is not needed, since it happens two lines above.
drivers/overlay/overlay.go
Outdated
} | ||
rootUID = int(st.UID()) | ||
rootGID = int(st.GID()) | ||
} // TODO: rootUID & rootGID of additional layer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are ignoring the error, Should you at least log it?
drivers/overlay/overlay.go
Outdated
for _, p := range d.AdditionalLayerStores() { | ||
dirPath = path.Join(p, id) | ||
if _, err := os.Stat(dirPath); err == nil { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should do a return dirpath
here, making the code easier to read.
273afd2
to
f8d8659
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a drive-by comment; I don’t yet understand how this works at all.
f8d8659
to
dfb7332
Compare
Request for comments (in the above PR description)
|
dfb7332
to
01176fb
Compare
Can we move this forward? |
store.go
Outdated
return nil, "", ErrLayerUnknown | ||
} | ||
name := base64.StdEncoding.EncodeToString([]byte(k + "=" + v)) | ||
if name == "diff" || name == "info" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name is the base64 encoding of the concatenation of a string + "=" + another string.
How could that ever be equal to "diff"
or "info"
?
cb3527d
to
eed23f2
Compare
@giuseppe Fixed the path design based on #795 (comment). The default layer search path on ALS:
When
Now we can pull and run the image with reusing layer from ALS. The example implementation of the filesystem for ALS is https://github.com/ktock/stargz-snapshotter/tree/als-pool-example/cmd/registry-storage. The remaining limitation is that ALS layers currently cannot be exported (i.e. saved or pushed) because:
The current workaround for this limitation:
I think we can work on this issue on the following PRs but please tell me if we should work on them in this PR as well. |
Can we move this forward? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually experiment with new features touching containers/image and containers/storage in skopeo.
Do you have any example on how the new API is going to be used?
EDIT: we don't need to support any existing file system, it is enough to create the remote directory manually, as long as we have a clear idea of how it can be used
eed23f2
to
83ce465
Compare
I list examples of the current patch with c/storage-related commands. Manually preparing additional layer storesWe can manually create an additional layers store using something like the following script: #!/bin/bash
set -euo pipefail
ORG="${1}"
STORE="${2}"
if [ "${1}" == "--ref" ] ; then
ORG="${2}"
STORE="${3}/$(echo -n ${2} | base64)"
fi
OCI="$(mktemp -d)"
skopeo copy docker://${ORG} oci://${OCI}
cat ${OCI}/blobs/sha256/$(cat ${OCI}/index.json | jq -r '.manifests[0].digest' | sed 's/sha256://') \
| jq -r '.layers[].digest' | sed 's/sha256://' | while read DGST ; do
mkdir -p ${STORE}/${DGST}/diff && \
tar -xf ${OCI}/blobs/sha256/${DGST} -C ${STORE}/${DGST}/diff && \
cat <<EOF > ${STORE}/${DGST}/info
{
"compressed-diff-digest": "sha256:${DGST}",
"compressed-size": $(stat --printf="%s" ${OCI}/blobs/sha256/${DGST}),
"diff-digest": "sha256:$(cat ${OCI}/blobs/sha256/${DGST} | gunzip | sha256sum | sed -E 's/([^ ]*).*/\1/g')",
"diff-size": $(cat ${OCI}/blobs/sha256/${DGST} | gunzip | wc -c),
"compression": 2
}
EOF
done Here we prepare two stores.
config for default layer store:
config for the layer store with "ref" option:
Running c/storage, c/image -based tools with the additional layer storesFor both of above configurations, we can use these stores with cri-o, podman, skopeo, etc. CRI-O
Podman
NOTE: Export-related commands (e.g.
Skopeo
NOTE: Export-related commands (e.g.
Mounting and using PoC filesystemhttps://github.com/ktock/stargz-snapshotter/tree/als-pool-example/cmd/registry-storage The following mounts current PoC filesystem (esgz-based) to
This mounted store can be used in the same ways as manually-created stores.
|
thanks! Do I need any patch for Podman? I've tried the suggestion above but "podman pull" still pulls the entire image from the registry |
No patch is needed to c/podman. It just need to be compiled with the patched c/image (containers/image#1109) and the patched c/storage (#795) so go.mod and Makefile need to be modified a bit. |
thanks, so it seems the :ref version doesn't work for me, while the version without :ref works
EDIT: also the version without ref seems to not work sometimes, if I repeat the same command multiple times "podman system reset && podman pull ...` sometimes the ALS is ignored |
I'll play more with it and update with more details, it is probably a mistake on my side |
sorry for the noise, the issue was on my side. It works fine now. Do you think containers/storage should signal somehow that the layer is not used anymore? A What do you think? |
@giuseppe I agree with the notification can be done using rmdir. However, the layer directory possibly is shared (symlinked) by multiple storages. So isn't it hard to determine whether it's safe to cleanup this only with "cleanup" notification? WDYT? |
If the magic backing filesystem can read blobs on-demand, can’t it just read all of it and provide that stream? The lack of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most importantly, how does digest verification happen?
(Reminder: I know very little about c/storage .)
layers.go
Outdated
compressedsums := make(map[digest.Digest][]string) | ||
uncompressedsums := make(map[digest.Digest][]string) | ||
names := make(map[string]*Layer) | ||
layer = copyLayer(layerInfo) | ||
layer.ID = id | ||
layer.Parent = parent | ||
layer.Created = time.Now().UTC() | ||
if layer.CompressedDigest != "" { | ||
compressedsums[layer.CompressedDigest] = append(compressedsums[layer.CompressedDigest], layer.ID) | ||
} | ||
if layer.UncompressedDigest != "" { | ||
uncompressedsums[layer.UncompressedDigest] = append(uncompressedsums[layer.UncompressedDigest], layer.ID) | ||
} | ||
for _, name := range layer.Names { | ||
if conflict, ok := names[name]; ok { | ||
r.removeName(conflict, name) | ||
} | ||
names[name] = layer | ||
} | ||
// TODO: check if necessary fields are filled | ||
r.layers = append(r.layers, layer) | ||
r.idindex.Add(id) | ||
r.byid[id] = layer | ||
r.byname = names | ||
r.bycompressedsum = compressedsums | ||
r.byuncompressedsum = uncompressedsums |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS this discards a lot of data about all the other layers. How does that work??!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks better now (but needs a review by someone who knows c/storage, i.e. not me).
83ce465
to
781e49b
Compare
How is this going? |
Signed-off-by: Kohei Tokunaga <[email protected]>
d02abea
to
64f0181
Compare
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ktock merged! |
Thanks! 🎉 |
Hey @ktock, this is super cool. I have been looking for something like this for a while now. Quick question: Does the demo from above still work? I have been trying to get podman to pull an image from the ALS, but it stills pulls from the remote registry even when the ALS is properly configured.
|
@chvish Are you talking about #795 (comment) ? The layout has been updated since then. Could you try the following script instead: #!/bin/bash
set -euo pipefail
ORG="${1}"
STORE="${2}"
if [ "${1}" == "--ref" ] ; then
ORG="${2}"
STORE="${3}/$(echo -n ${2} | base64)"
fi
OCI="$(mktemp -d)"
skopeo copy docker://${ORG} oci://${OCI}
cat ${OCI}/blobs/sha256/$(cat ${OCI}/index.json | jq -r '.manifests[0].digest' | sed 's/sha256://') \
| jq -r '.layers[].digest' | while read DGST ; do
ENCODED_DGST=$(echo -n ${DGST} | sed 's/sha256://')
mkdir -p ${STORE}/${DGST}/diff
tar -xf ${OCI}/blobs/sha256/${ENCODED_DGST} -C ${STORE}/${DGST}/diff
touch ${STORE}/${DGST}/use
cat <<EOF > ${STORE}/${DGST}/info
{
"compressed-diff-digest": "${DGST}",
"compressed-size": $(stat --printf="%s" ${OCI}/blobs/sha256/${ENCODED_DGST}),
"diff-digest": "sha256:$(cat ${OCI}/blobs/sha256/${ENCODED_DGST} | gunzip | sha256sum | sed -E 's/([^ ]*).*/\1/g')",
"diff-size": $(cat ${OCI}/blobs/sha256/${ENCODED_DGST} | gunzip | wc -c),
"compression": 2
}
EOF
done |
@ktock Works like a charm thanks. |
Podman 3.2 which is starting it release cycle now, should have this. |
@chvish Thank you for trying. CRI-O will also support this after cri-o/cri-o#4850 is merged. So please try it too if you're interested in. The necessary configuration should be the same as Podman. @rhatdan Thank you for starting the new release cycle! |
I think what we can do is to change the diff --git a/storage/storage_dest.go b/storage/storage_dest.go
index b15c9c3b..7a56fb55 100644
--- a/storage/storage_dest.go
+++ b/storage/storage_dest.go
@@ -367,7 +367,7 @@ func (s *storageImageDestination) tryReusingBlobAsPending(blobDigest digest.Dige
if options.SrcRef != nil {
// Check if we have the layer in the underlying additional layer store.
- aLayer, err := s.imageRef.transport.store.LookupAdditionalLayer(blobDigest, options.SrcRef.String())
+ aLayer, err := s.imageRef.transport.store.LookupAdditionalLayer(blobDigest, options.SrcRef.String(), options.TOCDigest)
if err != nil && !errors.Is(err, storage.ErrLayerUnknown) {
return false, private.ReusedBlob{}, fmt.Errorf(`looking for compressed layers with digest %q and labels: %w`, blobDigest, err)
} else if err == nil { and then propagate it down, until we can do something like, e.g.: $ git diff
diff --git a/drivers/overlay/overlay.go b/drivers/overlay/overlay.go
index b2f0e7a94..95a67bc15 100644
--- a/drivers/overlay/overlay.go
+++ b/drivers/overlay/overlay.go
@@ -2475,6 +2475,9 @@ func (al *additionalLayer) CreateAs(id, parent string) error {
if err := os.WriteFile(path.Join(dir, "additionallayer"), []byte(al.path), 0o644); err != nil {
return err
}
+ if err := os.WriteFile(path.Join(dir, "toc-digest"), []byte(al.tocDigest), 0o644); err != nil {
+ return err
+ }
notifyUseAdditionalLayer(al.path)
return os.Symlink(filepath.Join(al.path, "diff"), diffDir)
} what do you think? |
AFAICS nothing ever documented that this is related to TOC digests. So, from the c/image side, I’d prefer passing down the full raw value of
And I wonder about compatibility with ALS backend which don’t support this But I’ll leave both design aspects to c/storage experts. |
@giuseppe SGTM, thanks for your suggestion! In addition to the suggested change, Additional Layer Store implementation needs be fixed to expose the "actual" TOC digest (acquired from the layer data, etc.) via If either of the above design is ok to you, I'm willing to implement these changes. |
when would c/storage check for that though? Before using the layer? I think that the FUSE file system must fail at runtime with EIO if the provided TOC is different than what could be validated. |
How about the following design: Introduce a new FUSE file path
Other changes to |
Thinking about this more… from the c/image + c/storage side, every layer only has one identifier; it’s either the compressed blob digest, or the layer digest. And that identifier is used for deduplication, and it is security-relevant. So, this needs to be a rather more invasive change. Historically, ALS was always identifying / deduplicating the layer using the (compressed) blob digest. If the ALS is actually constructing trust based on the TOC digest, and the compressed blob digest is not relevant / not verified, then the layer must not be identified / deduplicated based on the compressed digest (because an attacker could cause a TOC-based pull with a non-matching compressed digest X, causing future ordinary “victim” pulls of digest X to be deduplicated with the attacker’s layer). So, I think we need the
|
containers/podman#4739
Reconsidered the design on 2021/1/12
This enables podman to create containers using layers stored in a specified directory instead of pulling them from the registry. Leveraging this feature with remotely-mountable layers provided by stargz/zstd:chunked or CVMFS, podman can achieve lazy pulling.
Changes in contianers/storage: #795
That directory is named "additional layer store" (call ALS in this doc) and has the following structure.
diff
directory contains the extracted layer diff contents specified by the key-value pairs.info
file contains*c/storage.Layer
struct that indicates the information of this layer contents (*c/storage.Layer.ID
,*c/storage.Layer.Parent
and*c/storage.Layer.Created
can be empty as it'll be filled byc/storage.Store
).On each pull,
c/storage.Store
searches the layer diff contents from ALS using pre-configured key-value pairs.Each key-value pair is base64 encoded.
By default, the following
key=value
pairs can be used as elements of the path in ALS,reference=<image reference>
layerdigest=<digest of the compressed layer contents>
Additionally, layer annotations (defined in the image manifest) prefixed by
containers/image/target.
can be used as well.The prefix
containers/image/target.
will be trimmed from the key when it's used in the path on ALS.Overlay driver supports an option to specify which key-value pair to be used and how the order should they be when
c/storage.Store
searches layers in the ALS.In the above case, on each pull,
c/storage.Store
searches the following path in ALS,The underlying filesystem (e.g. stargz/zstd:chunked-based filesystem or CVMFS) should show the exploded view of the target layer diff and its information at these locations.
Example filesystem implementation (currently stargz-based) is https://github.com/ktock/stargz-snapshotter/tree/als-pool-example (this must be mounted on )
If the layer content is found in ALS,
c/storage.Store
creates layer using<ALS root>/.../info
as*c/storage.Layer
and using<ALS root>/.../diff
as its diff directory.So
c/image
's copier doesn't need to pull this layer from the registry.Changes in containers/image: containers/image#1109
Now
c/image
's copier leverages this store.Every time this pulls an image, it first tries to reuse blobs from ALS.
That copier passes each layer's OCI annotations (key-value pairs) + the following key-value to
c/storage.Store
.*c/image.storageImageDestination.TryReusingBlob()
cannot pass image reference toc/storage.Store
so this commit adds a new API `*c/image.storageImageDestination.TryReusingBlobWithRef() for achieving this.When this copier successfully acquires that layer, this reuses this layer without pulling.
Changes in containers/podman: none
Command exapmle
In the above cases,
c/storage.Store
looks up/tmp/storage/base64("reference=ghcr.io/stargz-containers/rethinkdb:2.3.6-esgz")/base64(<layer digests>)/{diff, info}
in ALS.The example filesystem implementation (https://github.com/ktock/stargz-snapshotter/tree/als-pool-example) is mounted at
/tmp/storage
shows the extracted layer at that location.Then
rethinkdb:2.3.6-esgz
can run without pulling it from registry.Known limitation and request for comments
Some operations (e.g. save) requires correct value to be set to
c/storage.Layer.UncompressedSize
. This field seems to be the size of the layer but without compression (i.e. the size of the tar-archived format of that layer). For registry-backed ALS, getting this information is difficult because neither of OCI/Docker image nor registry API provides the way to get the uncompressed size of layers. We cannot get this information without actually pull and decompress the layer, which is not lazy pulling that this PR aims to.I'll check the codebase deeper to come up with the way to get this information from somewhere or the way to safely allow
c/storage.Layer.UncompressedSize
to be unknown during operations. But if someone has good idea for solving this, please let me know.cc: @siscia @giuseppe