-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bindles mix naming and ontology #264
Comments
An invoice was not modeled as a parcel at all. It was modeled as a set of structured metadata with well-defined fields. A parcel is to be thought of as "free form data" of which the system knows very little about. In contrast, an invoice is a set of known fields arranged in a particular way. Because these are different things, they can be reasoned about differently. A parcel is just a blob of data, and any change to that blob of data should rightly raise our eyebrows. But an invoice is about the semantics of the object, not the syntax. We want to be able to reason about what the invoice means, and detect any change to what the invoice means. We don't particularly care about a change to the syntax (e.g. whether whitespace has been compressed, whether it has been formatted in JSON or YAML or TOML or XML, etc). All we care about is that the semantic content of the invoice is unmodified. Ideally, what we want, then, is for a way to establish semantic immutability without caring about syntax -- we want to verify the meaning without verifying the presentation is the same presentation as it was before. This is valuable for several reasons, but the easiest one is that we can write documents in a human-readable format, but then have the system adapt those documents to whatever the technical requirements of the consuming agents are. While I am not thrilled with the current state of things, it does achieve this to a limited extent. That is, by recomposing fields in a trivial format, one can regenerate the merkel tree of the parcels. I'm not opposed to having a canonical representation of the invoice that we could transform an invoice into and then hash to generate. E.g. an ordered CBOR document would be fine for something like that. I don't actually feel too strongly about this particular feature of Bindle. It was done largely on pragmatic grounds, and to get away from the ridiculousness of having spurious "mutations" simply because (say) Go's serializer formats things slightly differently than (say) Java's. If we were to change to hashing the serialized object, we would need to make a few changes: We probably need to switch signatures to be detached objects, rather than being presented on the invoice. That would essentially allow signing to occur without mutating the invoice in any way. Yanking could likely be done the same way. /cc @thomastaylor312 |
We're basically at the crossroads of the Enarx project trying to decide whether to invest a person on Bindle or to build our own. I'd really prefer the former since Bindle is very close to what we want and I think a lot more people can benefit. Here's what I propose:
I don't buy the argument that the system can adapt the documents to the technical requirements of the consuming agents. This is because the consuming agents need to understand the Bindle protocol anyway. The particular serialization of the invoice is a far smaller requirement than the Bindle protocol as a whole. The other problem is that, while the goal of establishing semantic immutability is a noble one, there is no industry accepted method for doing this. The only thing we have is byte-for-byte immutability. And this is particularly true if you want a signature on the invoice. The moment you add a signature to an invoice it becomes byte-for-byte immutable.
If we can agree on an approach, I can dedicate someone to work on this before the end of the month. |
I may not be quite following the argument here but how does this work if a server stores invoice data as rows in a relational database (e.g. for ease of lookup in larger systems) rather than as a blob? |
@itowlson That is an internal implementation detail. The API would need to reproduce the invoice byte-for-byte as it was uploaded. But you can store the semantic meaning any way you want internally for things like queries and such. |
Let me try to rephrase the problem another way. Let's say that a Bindle server gets compromised. During the compromised period, workloads were deployed from Bindle. The Bindle server and the workload runner belong to two different parties. The owner of the Bindle server publishes that a compromise occurred during a certain window of time. The workload owners now need to do forensic reconstruction of the logs for all the systems which deployed a workload from Bindle during this period to find out what was compromised. You see in the log that workload On the other hand, if your log contains In order to accomplish this, the invoice needs to be divided into two parts:
A few important properties arise from this:
This is roughly the way that docker hub and OCI container registries work the way they do. And these represent real advancement in the state of the art. It would be a shame to lose these properties while trying to build something better. |
A couple of comments: First off, I think the idea of having a canonical serialization and a hash of that data is a good idea. I actually think splitting in to two parts is a possible good solution here (pending me reasoning through it some more). Basically it would make the signing part of invoices much simpler (rather than needing to reconstruct the entire parcel list). However, with that said there are a couple things that should still be requirements:
I am not a fan of having an invoice be a parcel as well because as @technosophos stated, they are fundamentally different things (one expressing relationships and one being arbitrary data). Although with the idea of splitting the invoice into 2 parts, this becomes more of an implementation detail. |
@npmccallum Thanks for the clarification. Your use case clarifies the goal, and seems like a useful thing to have. I agree that storage format should be an implementation detail, but was struggling to understand how to reconcile that with (what I understood as) the proposal to treat invoices as parcels. |
A parcel is defined by its ontology, namely the hash of its contents. If a parcel changes, its hash changes. This is great precisely because immutability is enforced throughout the entirety of the chain. At any point you can validate that the parcel is unmodified.
An invoice is, in reality, just a kind of parcel where the server knows how to introspect its contents. And yet, an invoice is not defined by its ontology but a name. Therefore, it is not possible to track the immutability of the invoice throughout the system.
Naming and ontology are two different layers and bindle's current approach mixes these layers. It appears that the decision to mix these layers was based on the desire to be able to gain isomorphism between invoice encodings (TOML, JSON, CBOR, etc). But it isn't obvious to me why this is a design goal and why we have to give up the most important property of a content store (verifiable immutability of all contents) in order to achieve it.
IMHO, bindle should operate more like the other successful content stores (git, S3, OCI, docker hub, etc) where all objects are immutable and referred to by ontology. Naming is a layer above it and you can "tag" a name (which includes version) to a particular object.
The text was updated successfully, but these errors were encountered: