Skip to content

Commit

Permalink
Add support for CI
Browse files Browse the repository at this point in the history
This runs tests on code and mdbook for each pull request. Most of the
changes bring the mdbook into a passing state, although more work is needed
to properly compile all code snippets. Some examples are complied, while
many are still ignored.

Signed-off-by: Moritz Hoffmann <[email protected]>
  • Loading branch information
antiguru committed Jul 10, 2023
1 parent 14417e6 commit 2231de1
Show file tree
Hide file tree
Showing 25 changed files with 315 additions and 133 deletions.
18 changes: 18 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: deploy

on:
push:
branches:
- master

jobs:
deploy:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- run: cargo install mdbook --version 0.4.20
- run: cd mdbook && mdbook build
- uses: JamesIves/github-pages-deploy-action@v4
with:
branch: gh-pages
folder: mdbook/book
18 changes: 18 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: test

on: [push, pull_request]

jobs:
test:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- run: rustup update 1.60 --no-self-update && rustup default 1.60
- run: cargo build
- name: test mdBook
# rustdoc doesn't build dependencies, so it needs to run after `cargo build`,
# but its dependency search gets confused if there are multiple copies of any
# dependency in target/debug/deps, so it needs to run before `cargo test` et al.
# clutter target/debug/deps with multiple copies of things.
run: for file in $(find mdbook -name '*.md'); do rustdoc --test $file -L ./target/debug/deps; done
- run: cargo test
6 changes: 3 additions & 3 deletions mdbook/src/chapter_0/chapter_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

Differential dataflow programs are structured as two easy steps:

1. Write a program.
2. Change its input.
1. Write a program.
2. Change its input.

We will work through an example program, and then interact with it by changing its inputs. Our goal is foremost to show you what a program looks like, and to give you a sense for what interactions look like.

Once we've done this, in the next chapter we will jazz things up a bit with an increased scale of data, computation, and interaction!
Once we've done this, in the next chapter we will jazz things up a bit with an increased scale of data, computation, and interaction!
35 changes: 20 additions & 15 deletions mdbook/src/chapter_0/chapter_0_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,39 @@ The first thing you will need to do, if you want to follow along with the exampl

With Rust in hand, crack open a shell and make a new project using Rust build manager `cargo`.

Echidnatron% cargo new my_project
```shell
cargo new my_project
```

This should create a new folder called `my_project`, and you can wander in there and type

Echidnatron% cargo run
```shell
cargo run
```

This will do something reassuring but pointless, like print `Hello, world!`, because we haven't gotten differential dataflow involved yet. I mean, it's Rust and you could learn that, but you probably want to read a different web page in that case.

Instead, edit your `Cargo.toml` file, which tells Rust about your dependencies, to look like this:

Echidnatron% cat Cargo.toml
[package]
name = "my_project"
version = "0.1.0"
authors = ["Your Name <[email protected]>"]
```toml
[package]
name = "my_project"
version = "0.1.0"
authors = ["Your Name <[email protected]>"]

[dependencies]
timely = "0.11.1"
differential-dataflow = "0.11.0"
Echidnatron%
[dependencies]
timely = "0.11.1"
differential-dataflow = "0.11.0"
```

You should only need to add those last two lines there, which bring in dependencies on both [timely dataflow](https://github.com/TimelyDataflow/timely-dataflow) and [differential dataflow](https://github.com/TimelyDataflow/differential-dataflow). We will be using both of those.

If you would like to point at the most current code release, hosted on github, you can replace the dependencies with:

[dependencies]
timely = { git = "https://github.com/TimelyDataflow/timely-dataflow" }
differential-dataflow = { git = "https://github.com/TimelyDataflow/differential-dataflow" }

```toml
[dependencies]
timely = { git = "https://github.com/TimelyDataflow/timely-dataflow" }
differential-dataflow = { git = "https://github.com/TimelyDataflow/differential-dataflow" }
```

You should now be ready to go. Code examples should mostly work, and you should complain (or [file an issue](https://github.com/TimelyDataflow/differential-dataflow/issues)) if they do not!
91 changes: 47 additions & 44 deletions mdbook/src/chapter_0/chapter_0_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,75 +6,78 @@ Let's write a program with one input: a collection `manages` of pairs `(manager,

If you are following along at home, put this in your `src/main.rs` file.

```rust,no_run
extern crate timely;
extern crate differential_dataflow;
```rust
extern crate timely;
extern crate differential_dataflow;

use differential_dataflow::input::InputSession;
use differential_dataflow::operators::Join;
use differential_dataflow::input::InputSession;
use differential_dataflow::operators::Join;

fn main() {
fn main() {

// define a new timely dataflow computation.
timely::execute_from_args(std::env::args(), move |worker| {

// create an input collection of data.
let mut input = InputSession::new();
// create an input collection of data.
let mut input = InputSession::new();

// define a new computation.
worker.dataflow(|scope| {
// define a new computation.
worker.dataflow(|scope| {

// create a new collection from our input.
let manages = input.to_collection(scope);
// create a new collection from our input.
let manages = input.to_collection(scope);

// if (m2, m1) and (m1, p), then output (m1, (m2, p))
manages
.map(|(m2, m1)| (m1, m2))
.join(&manages)
.inspect(|x| println!("{:?}", x));
});
// if (m2, m1) and (m1, p), then output (m1, (m2, p))
manages
.map(|(m2, m1)| (m1, m2))
.join(&manages)
.inspect(|x| println!("{:?}", x));
});

// Read a size for our organization from the arguments.
let size = std::env::args().nth(1).unwrap().parse().unwrap();
// Set an arbitrary size for our organization.
let size = 100;

// Load input (a binary tree).
input.advance_to(0);
for person in 0 .. size {
input.insert((person/2, person));
}
// Load input (a binary tree).
input.advance_to(0);
for person in 0 .. size {
input.insert((person/2, person));
}

}).expect("Computation terminated abnormally");
}
}).expect("Computation terminated abnormally");
}
```

This program has a bit of boilerplate, but at its heart it defines a new input `manages` and then joins it with itself, once the fields have been re-ordered. The intent is as stated in the comment:

```rust,no_run
```rust
// if (m2, m1) and (m1, p), then output (m1, (m2, p))
```

We want to report each pair `(m2, p)`, and we happen to also produce as evidence the `m1` connecting them.

When we execute this program we get to see the skip-level reports for the small binary tree we loaded as input:

Echidnatron% cargo run -- 10
Running `target/debug/my_project`
((0, (0, 0)), 0, 1)
((0, (0, 1)), 0, 1)
((1, (0, 2)), 0, 1)
((1, (0, 3)), 0, 1)
((2, (1, 4)), 0, 1)
((2, (1, 5)), 0, 1)
((3, (1, 6)), 0, 1)
((3, (1, 7)), 0, 1)
((4, (2, 8)), 0, 1)
((4, (2, 9)), 0, 1)
Echidnatron%
```ignore
Echidnatron% cargo run -- 10
Running `target/debug/my_project`
((0, (0, 0)), 0, 1)
((0, (0, 1)), 0, 1)
((1, (0, 2)), 0, 1)
((1, (0, 3)), 0, 1)
((2, (1, 4)), 0, 1)
((2, (1, 5)), 0, 1)
((3, (1, 6)), 0, 1)
((3, (1, 7)), 0, 1)
((4, (2, 8)), 0, 1)
((4, (2, 9)), 0, 1)
Echidnatron%
```

This is a bit crazy, but what we are seeing is many triples of the form

(data, time, diff)

```ignore
(data, time, diff)
```
describing how the data have *changed*. That's right; our input is actually a *change* from the initially empty input. The output is showing us that at time `(Root, 0)` several tuples have had their frequency incremented by one. That is a fancy way of saying they are the output.

This may make more sense in just a moment, when we want to *change* the input.
This may make more sense in just a moment, when we want to *change* the input.
14 changes: 10 additions & 4 deletions mdbook/src/chapter_0/chapter_0_2.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Our organization has gone from one where each manager has at most two reports, t

The only change we'll make is to add the following just after we load up our initial org chart:

```rust,no_run
```rust,ignore
for person in 1 .. size {
input.advance_to(person);
input.remove((person/2, person));
Expand All @@ -16,15 +16,15 @@ The only change we'll make is to add the following just after we load up our ini

This moves us through new times, indicated by the line

```rust,no_run
```rust,ignore
input.advance_to(person);
```

which advances the state of the `input` collection up to a timestamp `person`, which just happens to be integers that are conveniently just larger than the time `0` we used to load the data.

Once we've advanced the time, we make some changes.

```rust,no_run
```rust,ignore
input.remove((person/2, person));
input.insert((person/3, person));
```
Expand All @@ -33,6 +33,7 @@ This removes the prior management relation, and introduces a new one where the p

We do this for each of the non-boss employees and get to see a bunch of outputs.

```ignore
Echidnatron% cargo run -- 10
Running `target/debug/my_project`
((0, (0, 0)), 0, 1)
Expand Down Expand Up @@ -68,6 +69,7 @@ We do this for each of the non-boss employees and get to see a bunch of outputs.
((4, (2, 9)), 0, 1)
((4, (2, 9)), 4, -1)
Echidnatron%
```

Gaaaaaaah! What in the !#$!?

Expand All @@ -81,20 +83,24 @@ It turns out our input changes result in output changes. Let's try and break thi

Let's look at the entries for time `4`.

```ignore
((1, (0, 4)), 4, 1)
((2, (0, 4)), 4, -1)
((4, (1, 8)), 4, 1)
((4, (1, 9)), 4, 1)
((4, (2, 8)), 4, -1)
((4, (2, 9)), 4, -1)
```

There is a bit going on here. Four's manager changed from two to one, and while their skip-level manager remained zero the explanation changed. The first two lines record this change. The next four lines record the change in the skip-level manager of four's reports, eight and nine.

At the end, time `9`, things are a bit simpler because we have reached the employees with no reports, and so the only changes are their skip-level manager, without any implications for other people.

```ignore
((3, (1, 9)), 9, 1)
((4, (1, 9)), 9, -1)
```

Oof. Well, we probably *could* have figured these things out by hand, right?

Let's check out some ways this gets more interesting.
Let's check out some ways this gets more interesting.
Loading

0 comments on commit 2231de1

Please sign in to comment.