Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo-level transaction ID for aggregate loading #9

Open
bruth opened this issue Jul 24, 2017 · 7 comments
Open

Repo-level transaction ID for aggregate loading #9

bruth opened this issue Jul 24, 2017 · 7 comments

Comments

@bruth
Copy link

bruth commented Jul 24, 2017

This is mostly a question whether it is in scope and/or if you have thought about this use case.

The API supports loading an aggregate at a particular version of itself, but If I wanted to get the state of two aggregates at some point in time, my understanding is that there is no way of doing this since the version is local to the aggregate. Since Repository.Save takes multiple events which may span multiple aggregates, they are being transactionally saved together (at least conceptually) and thus could/should represent an atomic change in state in the repo.

If there was a repo-level transaction id generated (monotonically increasing) on each call and added to each record, then an aggregate can be loaded relative to the transaction id which means the state of all aggregates in the repo could be loaded with respect to some point in time. At that point you could get a copy of the repo "as of" some transaction (or time).

// Initialize.
r := eventsource.New(...)

// Perform 10 calls to r.Save() so tx = 10 by the end

// Get the repo as-of transaction 5.
r5 := r.Asof(5)

bobAt5 := r5.Load(ctx, "bob")
bob := r.Load(ctx, "bob")

A timestamp could be associated with each transaction id so Asof could be a real time.Time rather than the transaction id or AsofT for txid and Asof for time.

@savaki
Copy link
Contributor

savaki commented Jul 29, 2017

Those are some interesting points.

For the first one, Repository.Save is intended to only take events from a single aggregate in increasing version order. I'll clarify the docs and the verification so the intent is clearly. With that guarantee, it's possible for dynamodb, mysql, and postgres to offer the same guarantees of atomic-ness.

Hmm, quite honestly we had never consider a loading all the events in the repo up to a point in time. Our thinking has been to treat each aggregate independently.

Could you describe some use cases where you'd want to rebuild the entire repo up to a certain point in time rather than just a specific aggregate?

I do like the idea of rebuilding the aggregate as of a given time.

@bruth
Copy link
Author

bruth commented Jul 29, 2017

The use case I had in mind was actually to support alternate event timelines, i.e. branches that (in my case) would get merged/committed back into some main repo I declare. So a very basic version of the git model, but for structured data. The only caveat is that the events in my case would likely need to be the commands or intents in order for conflicts to be detected, but that is a separate issue.

One strategy could be to maintain separate repos, say, for each branch and manage the merging process. If the repo supported querying based on time or a transaction id (i.e. each set of events that were committed), then the branches would simply act as extensions to the root repo. State could be rebuilt by reading the main repo up to some time T and then whatever events are in the branched repo would finalize building the current state.

@delaneyj
Copy link

@bruth it may sound crazy, but why not 'just' use git? If you are doing branches and merges it seems to fit the bill better.

@bruth
Copy link
Author

bruth commented Jul 29, 2017

@delaneyj I have thought about that and agree that would ideal if it could work. I guess my concern is that I want to handle the semantics of merging and dealing with conflicts if they arise. I presume if there is a conflict I could just read the bytes and present the conflict in a structured way.

@bruth
Copy link
Author

bruth commented Jul 29, 2017

To get more concrete... I am modeling computational research workflows where (think machine learning or statistical pipelines) which may involve a team of people. A workflow is often planned at a high level up front, but evolves over time. During that process team members may work on and change certain parts of the workflow as they learn more about it. Ideally the changes made are captured and can be discretely viewed by others working on the project.

Since this is research, it is quite common to try various things (branches) only to discover one that may be good or applicable to the research goal. In practice there is a low chance that conflicts will actually emerge since most research teams are small and work on separate parts, however the ability to have separate, temporary lineages of a project is ideal for this type of work.

There is a screen shot here showing a clip of a workflow: https://rdm.academy/

@bruth
Copy link
Author

bruth commented Jul 31, 2017

Hm. Well I believe I was thinking about aggregates incorrectly. The workflow should be the aggregate that ultimately references other entities internally (but that is an implementation detail).

I opened up a PR #10 just to get the time idea across. It adds LoadVersion and LoadTime methods to Repository. The one problem with the LoadTime implementation is that it requires loading all events for the aggregate and checking if the max event time has been reached. This could be optimized by having a lower level method that limits events by time.

@bruth
Copy link
Author

bruth commented Jul 31, 2017

To make time first class for querying, the Record type would have to have a time field so the querying could be pushed down to the underlying store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants