Skip to content

Commit

Permalink
Add support for CI (TimelyDataflow#373)
Browse files Browse the repository at this point in the history
  • Loading branch information
antiguru authored Jul 10, 2023
1 parent 14417e6 commit 99fa67d
Show file tree
Hide file tree
Showing 25 changed files with 315 additions and 134 deletions.
18 changes: 18 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: deploy

on:
push:
branches:
- master

jobs:
deploy:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- run: cargo install mdbook --version 0.4.31
- run: cd mdbook && mdbook build
- uses: JamesIves/github-pages-deploy-action@v4
with:
branch: gh-pages
folder: mdbook/book
18 changes: 18 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: test

on: [push, pull_request]

jobs:
test:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- run: rustup update 1.70 --no-self-update && rustup default 1.70
- run: cargo build
- name: test mdBook
# rustdoc doesn't build dependencies, so it needs to run after `cargo build`,
# but its dependency search gets confused if there are multiple copies of any
# dependency in target/debug/deps, so it needs to run before `cargo test` et al.
# clutter target/debug/deps with multiple copies of things.
run: for file in $(find mdbook -name '*.md' | sort); do rustdoc --test $file -L ./target/debug/deps; done
- run: cargo test
6 changes: 3 additions & 3 deletions mdbook/src/chapter_0/chapter_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

Differential dataflow programs are structured as two easy steps:

1. Write a program.
2. Change its input.
1. Write a program.
2. Change its input.

We will work through an example program, and then interact with it by changing its inputs. Our goal is foremost to show you what a program looks like, and to give you a sense for what interactions look like.

Once we've done this, in the next chapter we will jazz things up a bit with an increased scale of data, computation, and interaction!
Once we've done this, in the next chapter we will jazz things up a bit with an increased scale of data, computation, and interaction!
35 changes: 20 additions & 15 deletions mdbook/src/chapter_0/chapter_0_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,39 @@ The first thing you will need to do, if you want to follow along with the exampl

With Rust in hand, crack open a shell and make a new project using Rust build manager `cargo`.

Echidnatron% cargo new my_project
```shell
cargo new my_project
```

This should create a new folder called `my_project`, and you can wander in there and type

Echidnatron% cargo run
```shell
cargo run
```

This will do something reassuring but pointless, like print `Hello, world!`, because we haven't gotten differential dataflow involved yet. I mean, it's Rust and you could learn that, but you probably want to read a different web page in that case.

Instead, edit your `Cargo.toml` file, which tells Rust about your dependencies, to look like this:

Echidnatron% cat Cargo.toml
[package]
name = "my_project"
version = "0.1.0"
authors = ["Your Name <[email protected]>"]
```toml
[package]
name = "my_project"
version = "0.1.0"
authors = ["Your Name <[email protected]>"]

[dependencies]
timely = "0.11.1"
differential-dataflow = "0.11.0"
Echidnatron%
[dependencies]
timely = "0.11.1"
differential-dataflow = "0.11.0"
```

You should only need to add those last two lines there, which bring in dependencies on both [timely dataflow](https://github.com/TimelyDataflow/timely-dataflow) and [differential dataflow](https://github.com/TimelyDataflow/differential-dataflow). We will be using both of those.

If you would like to point at the most current code release, hosted on github, you can replace the dependencies with:

[dependencies]
timely = { git = "https://github.com/TimelyDataflow/timely-dataflow" }
differential-dataflow = { git = "https://github.com/TimelyDataflow/differential-dataflow" }

```toml
[dependencies]
timely = { git = "https://github.com/TimelyDataflow/timely-dataflow" }
differential-dataflow = { git = "https://github.com/TimelyDataflow/differential-dataflow" }
```

You should now be ready to go. Code examples should mostly work, and you should complain (or [file an issue](https://github.com/TimelyDataflow/differential-dataflow/issues)) if they do not!
91 changes: 47 additions & 44 deletions mdbook/src/chapter_0/chapter_0_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,75 +6,78 @@ Let's write a program with one input: a collection `manages` of pairs `(manager,

If you are following along at home, put this in your `src/main.rs` file.

```rust,no_run
extern crate timely;
extern crate differential_dataflow;
```rust
extern crate timely;
extern crate differential_dataflow;

use differential_dataflow::input::InputSession;
use differential_dataflow::operators::Join;
use differential_dataflow::input::InputSession;
use differential_dataflow::operators::Join;

fn main() {
fn main() {

// define a new timely dataflow computation.
timely::execute_from_args(std::env::args(), move |worker| {

// create an input collection of data.
let mut input = InputSession::new();
// create an input collection of data.
let mut input = InputSession::new();

// define a new computation.
worker.dataflow(|scope| {
// define a new computation.
worker.dataflow(|scope| {

// create a new collection from our input.
let manages = input.to_collection(scope);
// create a new collection from our input.
let manages = input.to_collection(scope);

// if (m2, m1) and (m1, p), then output (m1, (m2, p))
manages
.map(|(m2, m1)| (m1, m2))
.join(&manages)
.inspect(|x| println!("{:?}", x));
});
// if (m2, m1) and (m1, p), then output (m1, (m2, p))
manages
.map(|(m2, m1)| (m1, m2))
.join(&manages)
.inspect(|x| println!("{:?}", x));
});

// Read a size for our organization from the arguments.
let size = std::env::args().nth(1).unwrap().parse().unwrap();
// Set an arbitrary size for our organization.
let size = 100;

// Load input (a binary tree).
input.advance_to(0);
for person in 0 .. size {
input.insert((person/2, person));
}
// Load input (a binary tree).
input.advance_to(0);
for person in 0 .. size {
input.insert((person/2, person));
}

}).expect("Computation terminated abnormally");
}
}).expect("Computation terminated abnormally");
}
```

This program has a bit of boilerplate, but at its heart it defines a new input `manages` and then joins it with itself, once the fields have been re-ordered. The intent is as stated in the comment:

```rust,no_run
```rust
// if (m2, m1) and (m1, p), then output (m1, (m2, p))
```

We want to report each pair `(m2, p)`, and we happen to also produce as evidence the `m1` connecting them.

When we execute this program we get to see the skip-level reports for the small binary tree we loaded as input:

Echidnatron% cargo run -- 10
Running `target/debug/my_project`
((0, (0, 0)), 0, 1)
((0, (0, 1)), 0, 1)
((1, (0, 2)), 0, 1)
((1, (0, 3)), 0, 1)
((2, (1, 4)), 0, 1)
((2, (1, 5)), 0, 1)
((3, (1, 6)), 0, 1)
((3, (1, 7)), 0, 1)
((4, (2, 8)), 0, 1)
((4, (2, 9)), 0, 1)
Echidnatron%
```ignore
Echidnatron% cargo run -- 10
Running `target/debug/my_project`
((0, (0, 0)), 0, 1)
((0, (0, 1)), 0, 1)
((1, (0, 2)), 0, 1)
((1, (0, 3)), 0, 1)
((2, (1, 4)), 0, 1)
((2, (1, 5)), 0, 1)
((3, (1, 6)), 0, 1)
((3, (1, 7)), 0, 1)
((4, (2, 8)), 0, 1)
((4, (2, 9)), 0, 1)
Echidnatron%
```

This is a bit crazy, but what we are seeing is many triples of the form

(data, time, diff)

```ignore
(data, time, diff)
```
describing how the data have *changed*. That's right; our input is actually a *change* from the initially empty input. The output is showing us that at time `(Root, 0)` several tuples have had their frequency incremented by one. That is a fancy way of saying they are the output.

This may make more sense in just a moment, when we want to *change* the input.
This may make more sense in just a moment, when we want to *change* the input.
14 changes: 10 additions & 4 deletions mdbook/src/chapter_0/chapter_0_2.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Our organization has gone from one where each manager has at most two reports, t

The only change we'll make is to add the following just after we load up our initial org chart:

```rust,no_run
```rust,ignore
for person in 1 .. size {
input.advance_to(person);
input.remove((person/2, person));
Expand All @@ -16,15 +16,15 @@ The only change we'll make is to add the following just after we load up our ini

This moves us through new times, indicated by the line

```rust,no_run
```rust,ignore
input.advance_to(person);
```

which advances the state of the `input` collection up to a timestamp `person`, which just happens to be integers that are conveniently just larger than the time `0` we used to load the data.

Once we've advanced the time, we make some changes.

```rust,no_run
```rust,ignore
input.remove((person/2, person));
input.insert((person/3, person));
```
Expand All @@ -33,6 +33,7 @@ This removes the prior management relation, and introduces a new one where the p

We do this for each of the non-boss employees and get to see a bunch of outputs.

```ignore
Echidnatron% cargo run -- 10
Running `target/debug/my_project`
((0, (0, 0)), 0, 1)
Expand Down Expand Up @@ -68,6 +69,7 @@ We do this for each of the non-boss employees and get to see a bunch of outputs.
((4, (2, 9)), 0, 1)
((4, (2, 9)), 4, -1)
Echidnatron%
```

Gaaaaaaah! What in the !#$!?

Expand All @@ -81,20 +83,24 @@ It turns out our input changes result in output changes. Let's try and break thi

Let's look at the entries for time `4`.

```ignore
((1, (0, 4)), 4, 1)
((2, (0, 4)), 4, -1)
((4, (1, 8)), 4, 1)
((4, (1, 9)), 4, 1)
((4, (2, 8)), 4, -1)
((4, (2, 9)), 4, -1)
```

There is a bit going on here. Four's manager changed from two to one, and while their skip-level manager remained zero the explanation changed. The first two lines record this change. The next four lines record the change in the skip-level manager of four's reports, eight and nine.

At the end, time `9`, things are a bit simpler because we have reached the employees with no reports, and so the only changes are their skip-level manager, without any implications for other people.

```ignore
((3, (1, 9)), 9, 1)
((4, (1, 9)), 9, -1)
```

Oof. Well, we probably *could* have figured these things out by hand, right?

Let's check out some ways this gets more interesting.
Let's check out some ways this gets more interesting.
Loading

0 comments on commit 99fa67d

Please sign in to comment.