Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oci manifest format of puzzlefs #55

Open
ariel-miculas opened this issue Dec 14, 2022 · 11 comments
Open

Oci manifest format of puzzlefs #55

ariel-miculas opened this issue Dec 14, 2022 · 11 comments
Assignees

Comments

@ariel-miculas
Copy link
Collaborator

ariel-miculas commented Dec 14, 2022

The current puzzlefs manifest format is as follows:

$ target/debug/puzzlefs build ../test-puzzlefs/simple_rootfs /tmp/oci-simple first_try
$ cat /tmp/oci-simple/index.json | jq .
{
  "schemaVersion": -1,
  "manifests": [
    {
      "digest": "sha256:ddf711c6a55e0f90d6b85d487cc0f202a2189cf12ffb15851b27984dda74e414",
      "size": 55,
      "media_type": "application/vnd.puzzlefs.image.rootfs.v1",
      "annotations": {
        "org.opencontainers.image.ref.name": "first_try"
      }
    }
  ],
  "annotations": {}
}
$ file /tmp/oci-simple/blobs/sha256/ddf711c6a55e0f90d6b85d487cc0f202a2189cf12ffb15851b27984dda74e414
/tmp/oci-simple/blobs/sha256/ddf711c6a55e0f90d6b85d487cc0f202a2189cf12ffb15851b27984dda74e414: data
~/work/cisco/puzzlefs expose-add-rootfs-delta*
$ hexdump -C /tmp/oci-simple/blobs/sha256/ddf711c6a55e0f90d6b85d487cc0f202a2189cf12ffb15851b27984dda74e414
00000000  a1 69 6d 65 74 61 64 61  74 61 73 81 58 29 00 00  |.imetadatas.X)..|
00000010  00 00 00 00 00 00 01 00  ea 22 b7 85 11 76 3a 07  |........."...v:.|
00000020  97 70 7b e5 3a 52 f8 69  93 44 b5 02 7a bf 0f 6d  |.p{.:R.i.D..z..m|
00000030  f2 69 b8 0b 6b 44 26                              |.i..kD&|
00000037

Whereas for oci v1, the manifest has the following format:

$ cat oci/index.json | jq .
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:38d1071460074ff45300379f7d88d1057071c4348ab9819fa59c6083d159eba1",
      "size": 589,
      "annotations": {
        "org.opencontainers.image.ref.name": "first"
      }
    }
  ]
}

$ file oci/blobs/sha256/38d1071460074ff45300379f7d88d1057071c4348ab9819fa59c6083d159eba1
oci/blobs/sha256/38d1071460074ff45300379f7d88d1057071c4348ab9819fa59c6083d159eba1: JSON text data
$ cat oci/blobs/sha256/38d1071460074ff45300379f7d88d1057071c4348ab9819fa59c6083d159eba1 | jq .
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:c45108df90c1fceb9ce8b0d9b8aa3f09f1e7e34d29ae44928ae26e259c0282ce",
    "size": 1222
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
      "size": 83518086
    }
  ],
  "annotations": {
    "io.stackeroci.stacker.git_version": "v0.30.1-9-g5e775ca",
    "io.stackeroci.stacker.stacker_yaml": "first:\n  from:\n    type: docker\n    url: docker://centos:latest\n"
  }
}

@hallyn
Copy link
Contributor

hallyn commented Dec 16, 2022

The OCIv1 manifest format is specified at https://github.com/opencontainers/image-spec/blob/main/manifest.md . I think we should stick to something closer to that.

Perhaps:

$ cat oci/index.json | jq "."
{
  "schemaVersion": 3,
  "manifests": [
    {
      "digest": "sha256:6b7980a6390ed4614465ec87388856583313cf0125deab02be0256c23a3cb006",
      "size": 55,
      "media_type": "application/vnd.puzzlefs.image.manifest.v1",
      "annotations": {
        "org.opencontainers.image.ref.name": "firstimage"
      }
    }
  ],
  "annotations": {}
}
$ cat oci/blobs/sha256/6b7980a6390ed4614465ec87388856583313cf0125deab02be0256c23a3cb0 | jq "."
{
  "schemaVersion": 3,
  "mediaType": "application/vnd.puzzlefs.image.manifest.v1",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:c45108df90c1fceb9ce8b0d9b8aa3f09f1e7e34d29ae44928ae26e259c0282ce",
    "size": 1222
  },
  "config": {
    "mediaType": "application/vnd.puzzlefs.image.metadata.v1",
    "digest": "sha256:c45108df90c1fceb9ce8b0d9b8aa3f09f1e7e34d29ae44928ae26e259c0282ce",
    "size": 55
  },
  "files": [
    {
      "mediaType": "application/vnd.puzzlefs.image.filedata.v1",
      "digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
      "size": 403
    },
    {
      "mediaType": "application/vnd.puzzlefs.image.filedata.v1",
      "digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
      "size": 797
    },
    {
      "mediaType": "application/vnd.puzzlefs.image.filedata.v1",
      "digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
      "size": 679
    },
    {
      "mediaType": "application/vnd.puzzlefs.image.filedata.v1",
      "digest": "sha256:a38ced06f7cf1d2b235ffa81f165924cecddac544c0d915d13cffbe47ea29b56",
      "size": 561
    }
  ],
}

Explanation:

  1. Config is the runtime container config. We should still ship that.
  2. The application/vnd.puzzlefs.image.metadata.v1 points to what we
    currently are making our 'manifest'.
  3. The files[] array lists all the blobs so that a higher level tool/script
    can tell easily all the files that are needed out of this oci layout
    in order to copy the image. puzzlefs itself wouldn't need it since
    it can derive that from its own manifest, but that won't help puzzlefs
    if all the needed files/chunks aren't there :)

@ariel-miculas
Copy link
Collaborator Author

It does seem a little weird to duplicate the information in both a json format and a custom capnproto format. What's more, the notion of a layer in OCIv1, which is contained in a single file, doesn't map well with the puzzlefs concept of having a metadata file and multiple data files for a single layer.
We could abuse the format and make each metadata/data file a single layer, that may work for getting the existing tools to copy the files, but it doesn't seem like a good design decision.

@ariel-miculas
Copy link
Collaborator Author

@hallyn what do you think?

@hallyn
Copy link
Contributor

hallyn commented Aug 3, 2023

Well, failing a good idea for an alternative, let's leave it as is for now and re-open if we come up with something.

@ariel-miculas
Copy link
Collaborator Author

Now that I'm working on the stacker support for building PuzzleFS images, I think it's time to revisit this issue and the delta generation.
Should we stick to the original OCI image manifest specification? We would need new image media types, but I'm wondering whether being close to the OCI spec would make it easier for existing tools to work with PuzzleFS images. For mounting the PuzzleFS image in kernel, we would still need to have the manifest and layers in capnp format, maybe they could coexist.
We could add a new media type for the PuzzleFS layer which would point to the PuzzleFS metadata and then add support for parsing this new media type (e.g. making sure we add all the chunks pointed to by the metadata layer to the oci data store, i.e. blobs/sha256). Not sure how well the existing tools would deal with this, since there would be no references to these chunks/blobs from the usual json content descriptors, all the references would be only stored in the PuzzleFS metadata file, which is in capnp format. This approach is also hinted by Aleksa Sarai at the end of his blog post.
Or we could try this model proposed by Serge.
Another reason why we would want to stick close to the OCI format is to keep the OCI configuration format, which keeps information such as architecture, os, environment variables etc, which do not change if we generate a PuzzleFS image.
On the other hand, the OCI format it tightly coupled with the notion of layering, which we don't want to do with PuzzleFS. Deduplication is achieved by splitting the filesystem in chunks with the CDC algorithm, and sufficiently similar images should end up sharing most of the chunks. Since PuzzleFS doesn't fit the OCI model, we might as well not care about being compatible with it. This would however complicate the addition of other features, such as support for running a PuzzleFS container. Besides, we would need to take care of generating all the relevant OCI (or inspired from OCI) metadata bits and pieces.

@tych0, @hallyn do you have any thoughts on this?

@tych0
Copy link
Contributor

tych0 commented Aug 12, 2024

Hey, sorry for the delay.

Not sure how well the existing tools would deal with this, since there would be no references to these chunks/blobs from the usual json content descriptors, all the references would be only stored in the PuzzleFS metadata file, which is in capnp format.

I ran into this problem a bunch with tools when I did stacker's squashfs support, and filed stuff like opencontainers/image-spec#816 in support of it. I got it all plumbed through, and hopefully did it in a way that future-proofed it for puzzlefs, so I think a new mime type is a good path forward, especially since stuff like storage and hosting (i.e. the distribution spec) makes it so that you don't have to build tooling for those parts.

Or we could try this #55 (comment).

I think it's reasonable in a vacuum, but you would have to teach other tools (skopeo, dist spec) about this new format, which is kind of annoying.

On the other hand, the OCI format it tightly coupled with the notion of layering

There are two explicit mentions of layering, descriptors and history.

I think that for History, we'll still have this concept: users will build puzzlefs images by individual mutations to them (apt-get install python3, curl https://sh.rustup.rs | sh, cargo build myapp, etc.), which are still "layers". It's just that the underlying fs representation won't be 1:1 with that any more, because it's more efficient. But this idea of "here's the step that generated this delta" is still reasonable, IMO.

So what's left is Descriptors, which, while called Layers in the manifest, could be "just" a list of BlobRefs. Admittedly they're not layers, but the delta is so small, and the amount of work to generate the rest of the tooling is so great, that I would lean towards just re-using the OCI spec here. Maybe we can send some clarifying PRs that "not all OCI images need be layer based" or something?

Thank you for continuing to push on this, it's awesome!

@ariel-miculas
Copy link
Collaborator Author

Thanks for your input, @tych0
So what you're saying is we should abuse the oci Image manifest specification so that the existing tools will copy the necessary BlobRefs that we need for Puzzlefs.
It would look something like this:

 "layers": [
   {
      "mediaType": "application/vnd.puzzlefs.image.rootfs.v1",
      "digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
      "size": 83518086
    },
    {
      "mediaType": "application/application/vnd.puzzlefs.image.inodes.v1",
      "digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
      "size": 83518086
    },
    {
      "mediaType": "application/vnd.puzzlefs.image.filedata.v1",
      "digest": "sha256:a1d0c75327776413fa0db9ed3adcdbadedc95a662eb1d360dad82bb913f8a1d1",
      "size": 83518086
    },
  ],

where

  • application/vnd.puzzlefs.image.rootfs.v1 will point to a PuzzleFS manifest
  • application/application/vnd.puzzlefs.image.inodes.v1 will point to a PuzzleFS metadata layer
  • application/vnd.puzzlefs.image.filedata.v1 will point to the individual data blobs that comprise the filesystem

When mounting the image, PuzzleFS will parse the list of layers, extract the application/vnd.puzzlefs.image.rootfs.v1 manifest, and then use the information provided there to mount the image. Optionally it could compare the list of BlobRefs from the OCI Image manifest to the list of BlobRefs from the PuzzleFS manifest and metadata layers.

The main advantages would be compatibilty with existing tools and decoupling the PuzzleFS merkle tree structure from the OCI Image Manifest.
The disadvantage is that we are duplicating the information in two places and formats: once in the OCI Image manifest, and once in the PuzzleFS manifest and PuzzleFS metadata layers.

Did I get this right?
@mikemccracken @raharper @rchincha any thoughts on this?

@tych0
Copy link
Contributor

tych0 commented Aug 14, 2024

Did I get this right?

Heh, I don't think I quite got it right, I had forgotten that you needed mime types for the layers. It seems like a bit of a hack, but yes, that's what I had in mind.

(Is there a reason inodes is not part of rootfs?)

@ariel-miculas
Copy link
Collaborator Author

I think this was the original design even when we had cbor serialization. And we do have layers in PuzzleFS right now, and that's another thing to consider when designing the OCI format of PuzzleFS.

We could include the entire PuzzleFS metadata in one single capnp file, that way we'll only have application/vnd.puzzlefs.image.rootfs.v1 and application/vnd.puzzlefs.image.filedata.v1.

@tych0
Copy link
Contributor

tych0 commented Aug 16, 2024

I think this was the original design even when we had cbor serialization.

Definitely a mistake then :).

And we do have layers in PuzzleFS right now, and that's another thing to consider when designing the OCI format of PuzzleFS.

Yeah, it's a good point. It's almost as if OCI's "layers" is just transport for bits, and we want to allow images to have more than just the OCI's version of Metadata, Config, and Layers.

I suppose another option is that we could add pointers as Annotations on metadata, but then tools will not automatically transport them. IMO the way you have it above is probably the best because we can use existing tooling, even if it is slightly confusing.

We could include the entire PuzzleFS metadata in one single capnp file, that way we'll only have application/vnd.puzzlefs.image.rootfs.v1 and application/vnd.puzzlefs.image.filedata.v1.

that sounds reasonable to me.

ariel-miculas added a commit to ariel-miculas/puzzlefs that referenced this issue Sep 6, 2024
This simplifies the PuzzleFS layout by storing all the metadata
information into a single metadata file. The previous layout had one
manifest file which contained references to a list of metadata files,
each stored separately.

Relevant discussions: project-machine#55

Signed-off-by: Ariel Miculas-Trif <[email protected]>
@ariel-miculas ariel-miculas self-assigned this Sep 12, 2024
ariel-miculas added a commit to ariel-miculas/puzzlefs that referenced this issue Sep 15, 2024
Previously, the OCI Image Index contained a list of manifests which were
referencing the PuzzleFS rootfs image, i.e. the metadata of the PuzzleFS
image in Capnproto format. Now the Image Index [1] references an Image
Manifest [2] and the PuzzleFS image (the PuzzleFS rootfs image together
with the file chunks) is embedded into the layers field of the Image
Manifest.

Where PuzzleFS diverges from the Image Manifest spec is in the layers
definition: our layers are not self contained images and thus they do
not stack. Instead, we have a rootfs layer which stores the PuzzleFS
image rootfs and multiple file chunks which contain the actual data of
the filesystem. No extraction step is performed. Instead, when mounting
a PuzzleFS image, the filesystem is reconstructed from the PuzzleFS
metadata and the file chunks, not unlike how squashfs/erofs archives are
mounted directly.
See the "Inspecting a puzzlefs image" section from the README for more
details about the format.

The image config is an empty descriptor [3] for now, but we don't store
it in blobs/sha256, which causes `skopeo copy` to fail because it
doesn't find the blob referenced by the empty descriptor in the data
store. This will be addressed in a subsequent commit.

See project-machine#55 for more context.

[1] https://github.com/opencontainers/image-spec/blob/main/image-index.md
[2] https://github.com/opencontainers/image-spec/blob/main/manifest.md
[3] https://github.com/opencontainers/image-spec/blob/main/manifest.md#guidance-for-an-empty-descriptor

Signed-off-by: Ariel Miculas-Trif <[email protected]>
@ariel-miculas
Copy link
Collaborator Author

We should add a skopeo copy integration test and then we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants