split into workspace/multiple crates #32

cgwalters · 2024-11-11T17:00:14Z

In the original composefs crate I tried to keep it to literally just Rust bindings to the core functionality; there was no concept of a global repository, garbage collection etc.

Say that a project like rauc.io or actually what ostree/bootc do today; or maybe rpm/dpkg wants to do things with composefs. I think it makes sense to keep that "core" split, because the APIs and functionality there should be 100% stable. The rest of the things here would be in e.g. a composers-storage crate or so?

The text was updated successfully, but these errors were encountered:

allisonkarlitskaya · 2024-11-12T12:09:51Z

I think what's missing is to define what the "things" in question are, and what the value-add would be over just shelling out to mkcomposefs, composefs-info, or mount.composefs yourself. I kinda tend to think that it's better when a well-defined CLI is the API, without the need to add a wrapping layer that may or may not be what the person consuming it wants to use.

Aside from those wrappers (which I'd really prefer not to expose as APIs), I'm not sure what the "core" of this repository is, if not the "storage" part.

This is another reason that I'm struggling with the parser side of the dumpfile code from the original crate. I'm not sure why it would make sense to have that unless we're looking to implement our own version of mkcomposefs (which is not planned for the time being). If you have a dumpfile and you want to create a composefs out of it, then you don't need us to help you. That's literally mkcomposefs.

cgwalters · 2024-11-12T13:35:08Z

This is another reason that I'm struggling with the parser side of the dumpfile code from the original crate. I'm not sure why it would make sense to have that unless we're looking to implement our own version of mkcomposefs

Ah, I think that's rooted in the other disagreement. The primary use case here is parsing the output of composefs-info dump (i.e. vs splitstream as source of truth). The secondary use case is accepting untrusted dump files for a system service to run mkcomposefs + mount.composefs so we can do composefs unprivileged.

allisonkarlitskaya · 2024-11-12T13:55:31Z

I don't consider splitstream to be a source of truth in any way. It's strictly an implementation detail, and doesn't ever appear "on the wire", and definitely doesn't need to be specified in any way. The source of truth is the layer tarball. Splitstream is just a space-saving way to store it which also happens to have very nice performance characteristics for building images.

cgwalters · 2024-11-12T14:02:14Z

Yes, we can debate the splitstream one in the other issue. Did the two rationales for having a Rust parser for the dump file format make sense or no?

allisonkarlitskaya · 2024-11-12T14:21:30Z

Not really. Would we parse the dumpfile just to print it out again when running mkcomposefs?

cgwalters · 2024-11-12T14:25:07Z

For the "mkcomposefs from untrusted input" indeed that's what I'd like to do because it adds a whole layer of defense before passing potentially malicious input to C code that until recently had never seen fuzzing.

But the other case of parsing composefs-info dump into an in-memory filesystem metadata tree, we need a Rust parser right?

jeckersb · 2024-11-12T14:27:47Z

(Just inserting myself here to note that I've written a bunch of rust parsers in the past using nom, if that's something you'd be interested in me knocking out for the dump file format let me know)

cgwalters · 2024-11-12T15:44:00Z

In this case we're talking about the existing parser code in containers/composefs@7432e8c#diff-f8019245093e26a2f17e5b661cb9040f56327a3a29ac9fe79a3b4c5a5eb7050b that was dropped and I think has use cases, not writing new code currently.

allisonkarlitskaya · 2024-11-12T16:21:47Z

I still don't believe that composefs-info dump makes sense at all here. I'd actually have an easier time accepting tarfile-turned-into-dumpfile itself as an intermediate format and save us the roundtrip through composefs. But even this, I'd still consider to be inferior to the current code which is capable of reproducing the original tar stream bit-for-bit. In fact, I'm not even sure what creating an image would look like in that case, since the first thing we do is to verify that the entire image matches the sha256 checksum we're expecting (which involves checking the sha256 of the layers, which we'd need to be able to reliably reproduce).

cgwalters · 2024-11-12T16:52:38Z

OK let's try to spend the 30 mins tomorrow going through the splitstream bits.

But: AFAIK rauc.io today is using the composefs blobs as source-of-truth for GC and it is parsing them via the libcomposefs C code. Remember not every use case for composefs needs splitstream; they may not have the dual constraints/features imposed by OCI around checksums of tarballs and the desire to be able to re-push content. So for people that (reasonably) want to use composefs blobs as truth, a dump file parser is helpful. (Of course it'd be quite a bit more efficient to directly read the erofs either by binding the C library or a rewrite but that's its own heavy lift...)

So I think it makes sense to have the dump file parser available.

Are you getting hung up on using it in the existing code or merely how it should be structured/placed?

cgwalters · 2024-11-12T17:10:44Z

and the desire to be able to re-push content.

(And if they do have that desire, for projects that chose not-tar they may already have a "normal form" that they expect to use anyways; e.g. ostree is very much designed to have such a normal form; though there's different complexities in trying to have a composefs blob as source-of-truth for ostree, but that's tangential)

allisonkarlitskaya · 2024-11-12T18:56:53Z

But: AFAIK rauc.io today is using the composefs blobs as source-of-truth for GC and it is parsing them via the libcomposefs C code.

composefs-rs is doing that as well, except via parsing the (very simple) output of composefs-info objects.

cgwalters · 2024-11-13T21:07:40Z

composefs-rs is doing that as well, except via parsing the (very simple) output of composefs-info objects.

Yeah, that's a fair point. But then what I'd still say is I think we should have an API that is like get_composefs_objects(f: AsFd) -> Result<HashSet<Sha256>> or so right? That's "core functionality" as distinct from code that operates on an object directory.

That said it'd make total sense to me to have an struct ObjectsDirectory that is a thin abstraction over objects/<digest> etc and can may have a high level function like fn gc(&self, roots: HashSet<Sha256>) or so.

That leaves the higher level Repository and OCI type stuff in a distinct composers-storage crate or so that would have way more dependencies (e.g. I think the core crate probably shouldn't pull in tokio necessarily) and functionality.

cgwalters · 2024-11-13T21:08:31Z

In the very short term it's fine, we can basically publish mostly as is after some cleanups, there's no one that I know of using the previous crate version; this is more about setting expectations for more medium term direction.

allisonkarlitskaya · 2024-11-14T08:00:29Z

The one thing that I think absolutely makes sense to split out is the oci functionality. It's been kept in a separate subdirectory and it could (and should) be its own crate. Then we'd implement cfsctl as a separate crate that depends on composefs and composefs-oci.

The example you give — tokio — is a dependency only of the oci stuff, and only because the containers-image-proxy is async. There's a tonne more examples like that.

allisonkarlitskaya · 2024-11-14T08:04:15Z

Yeah, that's a fair point. But then what I'd still say is I think we should have an API that is like get_composefs_objects(f: AsFd) -> Result<HashSet> or so right? That's "core functionality" as distinct from code that operates on an object directory.

I still disagree about the usefulness of these kinds of APIs. This is basically a very thin wrapper on top of composefs-info objects. You can write that for yourself in a dozen or so lines of code and it might be that the decisions we make about how to format that output (return HashSet instead of call a callback, parsed Sha256 instead of hex digest string, or even split out xx/yyyy format strings, etc) are not the ones that the eventual users of the API would want.

In short: if people want these core "composefs" functionalities, they should just use the composefs CLI as an API. That's what it's there for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split into workspace/multiple crates #32

split into workspace/multiple crates #32

cgwalters commented Nov 11, 2024

allisonkarlitskaya commented Nov 12, 2024

cgwalters commented Nov 12, 2024 •

edited

Loading

allisonkarlitskaya commented Nov 12, 2024

cgwalters commented Nov 12, 2024

allisonkarlitskaya commented Nov 12, 2024

cgwalters commented Nov 12, 2024 •

edited

Loading

jeckersb commented Nov 12, 2024

cgwalters commented Nov 12, 2024

allisonkarlitskaya commented Nov 12, 2024

cgwalters commented Nov 12, 2024

cgwalters commented Nov 12, 2024

allisonkarlitskaya commented Nov 12, 2024

cgwalters commented Nov 13, 2024

cgwalters commented Nov 13, 2024

allisonkarlitskaya commented Nov 14, 2024

allisonkarlitskaya commented Nov 14, 2024

split into workspace/multiple crates #32

split into workspace/multiple crates #32

Comments

cgwalters commented Nov 11, 2024

allisonkarlitskaya commented Nov 12, 2024

cgwalters commented Nov 12, 2024 • edited Loading

allisonkarlitskaya commented Nov 12, 2024

cgwalters commented Nov 12, 2024

allisonkarlitskaya commented Nov 12, 2024

cgwalters commented Nov 12, 2024 • edited Loading

jeckersb commented Nov 12, 2024

cgwalters commented Nov 12, 2024

allisonkarlitskaya commented Nov 12, 2024

cgwalters commented Nov 12, 2024

cgwalters commented Nov 12, 2024

allisonkarlitskaya commented Nov 12, 2024

cgwalters commented Nov 13, 2024

cgwalters commented Nov 13, 2024

allisonkarlitskaya commented Nov 14, 2024

allisonkarlitskaya commented Nov 14, 2024

cgwalters commented Nov 12, 2024 •

edited

Loading

cgwalters commented Nov 12, 2024 •

edited

Loading