Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bundle: releases #172

Open
erichanson opened this issue Apr 20, 2019 · 5 comments
Open

bundle: releases #172

erichanson opened this issue Apr 20, 2019 · 5 comments
Labels
epic big meta-ticket that is more of an outcome than a specific task idea question

Comments

@erichanson
Copy link
Member

erichanson commented Apr 20, 2019

There are many module systems. Most of them suffer from a common problem -- versioning. A module (in our case, bundle) is a moving target, ever changing over time. When using a bundle as a dependency, a developer would like that bundle to remain static, so that future changes to it won't break something in their app. However, some less-evolved module systems only allow the existence of a module at a specific version, not multiple versions simultaneously. Thus, when two different projects depend on the same dependency at different versions, tears. We would be wise to learn from their folly and side-step the little bits of history repeating.

Thus, releases. Hypothetically. The desired outcome would be such that a bundle commit can be exported as a release, which would assign all rows a new primary key, and update all foreign key references. Not every commit is a release, and we could continue to use the same foreign keys between commits, but when it's time to snapshot something as release-worthy, regenerate all the keys.

There's an additional opportunity here, namely caching. I've been experimenting with nginx caching rules, and there is much levity to be had therein. Bundle history shouldn't ever change, so one could imagine some kind of middleware function that retrieves a row at a particular release version. Like bundle.row(row_id meta.row_id, release_id uuid). The row needn't even exist in the database necessarily, as it could be assembled on the fly from the blobs in the bundle repository. While this might be slow to do at runtime, since bundle history never changes, we could cache anything retrieved from a call to bundle.row() indefinitely.

In the rest of the software world, there is git and then there is ... all these module systems. There's no reason that I can think of why these should be separate. Github supports releases, but that doesn't push a new version of a package to, say, npm. Unifying version control and packaging is a win.

@erichanson erichanson added question epic big meta-ticket that is more of an outcome than a specific task idea labels Apr 20, 2019
@HoboPapa
Copy link

An alternative or addition to a release tag is the branch. A branch can be used for a release like R0.2_RC-01 and so on through test activity for the release. Master keeps all the new stuff. Cherry pick from the release branch back to master as needed to keep master up to date. Then when the testing is done the release branch is R0.2-aquameta. The release branch stays forever. Checkout branch R0.2-aquameta always sets the sets HEAD to the last node in the branch which is by definition the "release". The release procedure in github asks to select the branch that contains the project you want to release. The release permits additional information to be added for it, but the source control aspect of the release looks like it is managed through the branch.

@erichanson
Copy link
Member Author

erichanson commented Apr 29, 2019

@atrooper Interesting! I was thinking any commit could be tagged as a release, regardless of its position in the commit tree. I see pros and cons. It's flexible, but... it's flexible. :) Another pro is that if there was a releases branch, one could just "distribute" that branch, and not all the development commits, which would save on space and nicely separate a dev clone from a release clone. Thanks.

@erichanson
Copy link
Member Author

In the case that two different bundles A and B depend on a third bundle C, but want different versions of it, we need two version of bundle C at the same time. Releases need to solve this.

This dovetails out into a number of different issues and sticky spots in the architecture:

  1. The endpoint.resource.path issue. Right now, endpoint.resource is just a single global namespace, and whatever bundles are checked out just dump their resources in there, with collisions galore, especially over /. There's also this question of, when making say a test suite for a bundle, where in the path space to put the resource that runs the tests. I've been loosely using the convention of the bundle name slash tests e.g. /org.aquameta.ui.layout/tests, but since a bundle's name is not guaranteed to be unique or immutable, this doesn't really scale.

  2. The widget.dependency_js and widget.dependency_css tables, which were a first attempt at dependencies for widgets, before we even had bundles. Those need to be dropped, and replaced somehow with bundle releases.

  3. Picking a schema name. Bundles that contain tables and views need a place to put them. For core bundles I've just been using a single word like meta or event or endpoint etc. This doesn't scale for third-party bundles, and there's a namespace collision that's going to happen over schema.

  4. Inspecting a bundle currently requires checking it out. This might overwrite existing development, etc. The working copy is a singleton, which is really inflexible.

Tools in the bag:

  1. Generate a unique hash for each commit, some kind of sha256 hash of all the blob in the commit plus the commit's uuid, author and message. Pretty nice way of signing a commit.

  2. Each checkout of a bundle gets it's own schema (in the case that the bundle actually contains any schema), probably named the hash above. When a bundle just contains rows in another bundle's schema (say it contains just some resources and widgets that exist over in the endpoint and widget bundles' schemas), those bundles too are checked out at a particular version, so they would have their own schemas.

  3. The entire endpoint and routing system would need to shift to this very bundle-centric paradigm, likely requiring every request to pass in a bundle release with it, so that the endpoint knows which tables to use.

  4. Though likely not all that useful, one idea is to lean heavily on how the PostgreSQL transaction system works, namely that each transaction gets it's own immutable copy of the database (roughly) and can make changes to it, and then possibly roll back those changes instead of committing them. We could do some weird stuff around beginning a transaction by checking out the requested release to the working copy, pulling any data required out of it, and then rolling back instead of committing. An idea, probably not a good one.

  5. We would need to stop referencing schemas by name and instead use a dynamic variable that the user would hopefully never need to include in their code but instead is assigned at runtime based on which bundle release is being referenced.

This is a deep refactor.

See also #88, #177.

@erichanson
Copy link
Member Author

"Editions" is also a nice word. Ganked from Rust.

@erichanson erichanson added this to the 0.3 release milestone Jan 15, 2021
@erichanson erichanson removed this from the 0.3 release milestone Jan 22, 2023
@micburks
Copy link
Member

I see two approaches here. The first is tagging commits with some kind of a pointer to a specific commit. In this scenario, there is no change to the distribution of the bundle, and installing a release would mean literally checking out the tagged commit. There can only be one checked-out version at a time. And we'd have to handle requests for other releases (tagged commits in the source tree, but not currently checked out) by rebuilding the source at that time with blobs. Going down that road would mean making a robust (could this be error-prone?) and performant method of rebuilding source. I'm not familiar with that space, but I would be worried about it being somewhat fast because caching is hard (on the client, in-memory on the server, or in the database?) and having even a dozen users for a bundle with a thousand commits could mean real server load [attribution needed].

The second approach would be that bundle releases are snapshots of HEAD at a given commit. This would require copying every row in the bundle and recreating any foreign keys with updated ids.

[...] separate a dev clone from a release clone [...]

Snapshot releases would mean the bundle that we use today becomes the source bundle. If you want to work on the bundle, you clone the source bundle. If you want to run an application or depend on a library from a bundle, you clone a release bundle. Although we duplicate the content of the bundle when making a release, we end up only delivering the necessary rows without all the source tree.

[...] when making say a test suite for a bundle [...]

I believe tests should always be next to the source code. When releasing in this scenario, tests (if we have a good heuristic for what a test is) could be split out into another release that depends on the bundle release. Want to verify that this release isn't broken? Download the test bundle and run it. This would help to colocate tests with the source code while avoiding inflating download sizes of distribution releases. Could be solved other ways, too.

The widget.dependency_js and widget.dependency_css tables, [...] need to be dropped, and replaced somehow with bundle releases [...]

Quick recap of our offline discussion. Versions do not belong in a field in the dependency_* row. Bumping a version breaks all references to this dependency. Yikes. There still needs to be a table of arbitrary js/css modules, but the version of these should be the bundle release version. We have to come out of this discussion with multiple bundle versions co-existing in some manner. Then releasing a new bundle does not destroy the existing version.

We would need to stop referencing schemas by name and instead use a dynamic variable that the user would hopefully never need to include in their code [...]

Heck yeah. This would be great, probably needed for releases, and I think its doable. I think even core schema should have generated names using their version and a release signature of sorts, e.g. endpoint_v0.3-XXX. When we implement bundle dependencies, we could lookup the schema name of the dependent bundle for any queries made from Widget (since we will soon always know what bundle a widget is from). There are some edge cases here to iron out, but this is compelling.

The entire endpoint and routing system would need to shift to this very bundle-centric paradigm [...]

Regarding bundle data in all requests, Widget and Datum are solvable. However, endpoint.resource collisions do pose an issue. I think it would be totally reasonable to require the owner of the instance to decide which resource to activate. With this, we might want to swap out endpoint.resource.active to another table that's under the user's control. Then resources are not on by default and you have to make a conscious decision to work around a collision.

This is a deep refactor.

Probably. But we're already rewriting widget/datum/endpoint for other reasons. I think we can start down this road now if we agree on something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic big meta-ticket that is more of an outcome than a specific task idea question
Projects
None yet
Development

No branches or pull requests

3 participants