Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize arrange to change type of records #300

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

frankmcsherry
Copy link
Member

This PR introduces a generalization to the arrange method, which now allows the user to interpose between the batch formation and accepting the batch into the output trace. In particular, the input and output trace formats may be different, which allows the user the ability to perform some non-standard translation, for example playing a state machine forward and recording the transitions in the output.

The vanilla arrange_core operator is now implemented using "logic" that just passes through the input batch. It looks like so:

        self.arrange_general::<P, Tr, Tr, _, _>(pact, name, |_capability, _trace_agent| {
            |batch, _capability| (batch, Vec::new())
        })

Other operators like upsert and reduce could in principle be ported to this framework, but I wanted to float this first before doing a massive re-write.

cc: @ruchirK @petrosagg

Copy link
Contributor

@ruchirK ruchirK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/// Arranges a differential dataflow collection with custom user logic.
///
/// This method generalizes `arrange` in that the output type may differ
/// from the input type, and the user is allow to perform logic as the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is allowed

@frankmcsherry
Copy link
Member Author

Brief thoughts on reduce at least: this probably isn't a great place to port that too without some more thinking, as one of its goals is to accept pre-arranged input, and this operator is doing the arranging. But, I could imagine with a bit more thinking finding a way to blend the two, where you get a stream of arranged data and input trace, which instances of this operator would then immediately drop (just because that is what it does) but which others (like reduce) could hold on to.

It's a bit weird, because this operator would generalize to "something that takes arrangements as input" which .. well at the moment it is what makes arrangements, so clearly there is some unpicking to do there.

The goal, though, is to avoid having so many copy/paste instances of "operator that forms batches and maintains traces" as we have in arrange, reduce, and upsert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants