Skip to content

Commit

Permalink
Deploying to gh-pages from @ 8388802 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
antiguru committed Sep 30, 2024
1 parent eed1bec commit d46b884
Show file tree
Hide file tree
Showing 27 changed files with 228 additions and 231 deletions.
4 changes: 2 additions & 2 deletions chapter_0/chapter_0_0.html
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ <h2 id="a-simplest-example"><a class="header" href="#a-simplest-example">A simpl
fn main() {
timely::example(|scope| {
(0..10).to_stream(scope)
.inspect(|x| println!(&quot;seen: {:?}&quot;, x));
.inspect(|x| println!("seen: {:?}", x));
});
}</code></pre></pre>
<p>This program gives us a bit of a flavor for what a timely dataflow program might look like, including a bit of what Rust looks like, without getting too bogged down in weird stream processing details. Not to worry; we will do that in just a moment!</p>
Expand All @@ -204,7 +204,7 @@ <h2 id="a-simplest-example"><a class="header" href="#a-simplest-example">A simpl
</code></pre>
<p>This isn't very different from a Rust program that would do this much more simply, namely the program</p>
<pre><pre class="playground"><code class="language-rust">fn main() {
(0..10).for_each(|x| println!(&quot;seen: {:?}&quot;, x));
(0..10).for_each(|x| println!("seen: {:?}", x));
}</code></pre></pre>
<p>Why would we want to make our life so complicated? The main reason is that we can make our program <em>reactive</em>, so that we can run it without knowing ahead of time the data we will use, and it will respond as we produce new data.</p>

Expand Down
2 changes: 1 addition & 1 deletion chapter_0/chapter_0_1.html
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ <h2 id="an-example"><a class="header" href="#an-example">An example</a></h2>
let probe = worker.dataflow(|scope|
scope.input_from(&amp;mut input)
.exchange(|x| *x)
.inspect(move |x| println!(&quot;worker {}:\thello {}&quot;, index, x))
.inspect(move |x| println!("worker {}:\thello {}", index, x))
.probe()
);

Expand Down
4 changes: 2 additions & 2 deletions chapter_0/chapter_0_2.html
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ <h1 id="when-to-use-timely-dataflow"><a class="header" href="#when-to-use-timely
<p>Timely dataflow may be a different programming model than you are used to, but if you can adapt your program to it there are several benefits.</p>
<ul>
<li>
<p><strong>Data Parallelism</strong>: The operators in timely dataflow are largely &quot;data-parallel&quot;, meaning they can operate on independent parts of the data concurrently. This allows the underlying system to distribute timely dataflow computations across multiple parallel workers. These can be threads on your computer, or even threads across computers in a cluster you have access to. This distribution typically improves the throughput of the system, and lets you scale to larger problems with access to more resources (computation, communication, and memory).</p>
<p><strong>Data Parallelism</strong>: The operators in timely dataflow are largely "data-parallel", meaning they can operate on independent parts of the data concurrently. This allows the underlying system to distribute timely dataflow computations across multiple parallel workers. These can be threads on your computer, or even threads across computers in a cluster you have access to. This distribution typically improves the throughput of the system, and lets you scale to larger problems with access to more resources (computation, communication, and memory).</p>
</li>
<li>
<p><strong>Streaming Data</strong>: The core data type in timely dataflow is a <em>stream</em> of data, an unbounded collection of data not all of which is available right now, but which instead arrives as the computation proceeds. Streams are a helpful generalization of static data sets, which are assumed available at the start of the computation. By expressing your program as a computation on streams, you've explained both how it should respond to static input data sets (feed all the data in at once) but also how it should react to new data that might arrive later on.</p>
Expand All @@ -191,7 +191,7 @@ <h1 id="when-to-use-timely-dataflow"><a class="header" href="#when-to-use-timely
<h2 id="generality"><a class="header" href="#generality">Generality</a></h2>
<p>Is timely dataflow always applicable? The intent of this research project is to remove layers of abstraction fat that prevent you from expressing anything your computer can do efficiently in parallel.</p>
<p>Under the covers, your computer (the one on which you are reading this text) is a dataflow processor. When your computer <em>reads memory</em> it doesn't actually wander off to find the memory, it introduces a read request into your memory controller, an independent component that will eventually return with the associated cache line. Your computer then gets back to work on whatever it was doing, hoping the responses from the controller return in a timely fashion.</p>
<p>Academically, I treat &quot;my computer can do this, but timely dataflow cannot&quot; as a bug. There are degrees, of course, and timely dataflow isn't on par with the processor's custom hardware designed to handle low level requests efficiently, but <em>algorithmically</em>, the goal is that anything you can do efficiently with a computer you should be able to express in timely dataflow.</p>
<p>Academically, I treat "my computer can do this, but timely dataflow cannot" as a bug. There are degrees, of course, and timely dataflow isn't on par with the processor's custom hardware designed to handle low level requests efficiently, but <em>algorithmically</em>, the goal is that anything you can do efficiently with a computer you should be able to express in timely dataflow.</p>

</main>

Expand Down
4 changes: 2 additions & 2 deletions chapter_0/chapter_0_3.html
Original file line number Diff line number Diff line change
Expand Up @@ -182,9 +182,9 @@ <h1 id="when-not-to-use-timely-dataflow"><a class="header" href="#when-not-to-us
<p>One could re-imagine the sorting process as moving data around, and indeed this is what happens when large clusters need to be brought to bear on such a task, but that doesn't help you at all if what you needed was to sort your single allocation. A library like <a href="https://github.com/nikomatsakis/rayon">Rayon</a> would almost surely be better suited to the task.</p>
<hr />
<p>Dataflow systems are also fundamentally about breaking apart the execution of your program into independently operating parts. However, many programs are correct only because some things happen <em>before</em> or <em>after</em> other things. A classic example is <a href="https://en.wikipedia.org/wiki/Depth-first_search">depth-first search</a> in a graph: although there is lots of work to do on small bits of data, it is crucial that the exploration of nodes reachable along a graph edge complete before the exploration of nodes reachable along the next graph edge.</p>
<p>Although there is plenty of active research on transforming algorithms from sequential to parallel, if you aren't clear on how to express your program as a dataflow program then timely dataflow may not be a great fit. At the very least, the first step would be &quot;fundamentally re-imagine your program&quot;, which can be a fine thing to do, but is perhaps not something you would have to do with your traditional program.</p>
<p>Although there is plenty of active research on transforming algorithms from sequential to parallel, if you aren't clear on how to express your program as a dataflow program then timely dataflow may not be a great fit. At the very least, the first step would be "fundamentally re-imagine your program", which can be a fine thing to do, but is perhaps not something you would have to do with your traditional program.</p>
<hr />
<p>Timely dataflow is in a bit of a weird space between language library and runtime system. This means that it doesn't quite have the stability guarantees a library might have (when you call <code>data.sort()</code> you don't think about &quot;what if it fails?&quot;), nor does it have the surrounding infrastructure of a <a href="https://www.microsoft.com/en-us/research/project/dryadlinq/">DryadLINQ</a> or <a href="https://spark.apache.org">Spark</a> style of experience. Part of this burden is simply passed to you, and this may be intolerable depending on your goals for your program.</p>
<p>Timely dataflow is in a bit of a weird space between language library and runtime system. This means that it doesn't quite have the stability guarantees a library might have (when you call <code>data.sort()</code> you don't think about "what if it fails?"), nor does it have the surrounding infrastructure of a <a href="https://www.microsoft.com/en-us/research/project/dryadlinq/">DryadLINQ</a> or <a href="https://spark.apache.org">Spark</a> style of experience. Part of this burden is simply passed to you, and this may be intolerable depending on your goals for your program.</p>

</main>

Expand Down
4 changes: 2 additions & 2 deletions chapter_1/chapter_1.html
Original file line number Diff line number Diff line change
Expand Up @@ -180,12 +180,12 @@ <h2 id="dataflow"><a class="header" href="#dataflow">Dataflow</a></h2>
<p>The most important part of dataflow programming is the <em>independence</em> of the components. When you write a dataflow program, you provide the computer with flexibility in how it executes your program. Rather than insisting on a specific sequence of instructions the computer should follow, the computer can work on each of the components as it sees fit, perhaps even sharing the work with other computers.</p>
<h2 id="timestamps"><a class="header" href="#timestamps">Timestamps</a></h2>
<p>While we want to enjoy the benefits of dataflow programming, we still need to understand whether and how our computation progresses. In traditional imperative programming we could reason that because instructions happen in some order, then once we reach a certain point all work (of a certain type) must be done. Instead, we will tag the data that move through our dataflow with <em>timestamps</em>, indicating (roughly) when they would have happened in a sequential execution.</p>
<p>Timestamps play at least two roles in timely dataflow: they allow dataflow components to make sense of the otherwise unordered inputs they see (&quot;ah, I received the data in <em>this</em> order, but I should behave as if it arrived in <em>this</em> order&quot;), and they allow the user (and others) to reason about whether they have seen all of the data with a certain timestamp.</p>
<p>Timestamps play at least two roles in timely dataflow: they allow dataflow components to make sense of the otherwise unordered inputs they see ("ah, I received the data in <em>this</em> order, but I should behave as if it arrived in <em>this</em> order"), and they allow the user (and others) to reason about whether they have seen all of the data with a certain timestamp.</p>
<p>Timestamps allow us to introduce sequential structure into our program, without requiring actual sequential execution.</p>
<h2 id="progress"><a class="header" href="#progress">Progress</a></h2>
<p>In a traditional imperative program, if we want to return the maximum of a set of numbers, we just scan all the numbers and return the maximum. We don't have to worry about whether we've considered <em>all</em> of the numbers yet, because the program makes sure not to provide an answer until it has consulted each number.</p>
<p>This simple task is much harder in a dataflow setting, where numbers arrive as input to a component that is tracking the maximum. Before releasing a number as output, the component must know if it has seen everything, as one more value could change its answer. But strictly speaking, nothing we've said so far about dataflow or timestamps provide any information about whether more data might arrive.</p>
<p>If we combine dataflow program structure with timestamped data in such a way that as data move along the dataflow their timestamps only increase, we are able to reason about the <em>progress</em> of our computation. More specifically, at any component in the dataflow, we can reason about which timestamps we may yet see in the future. Timestamps that are no longer possible are considered &quot;passed&quot;, and components can react to this information as they see fit.</p>
<p>If we combine dataflow program structure with timestamped data in such a way that as data move along the dataflow their timestamps only increase, we are able to reason about the <em>progress</em> of our computation. More specifically, at any component in the dataflow, we can reason about which timestamps we may yet see in the future. Timestamps that are no longer possible are considered "passed", and components can react to this information as they see fit.</p>
<p>Continual information about the progress of a computation is the only basis of coordination in timely dataflow, and is the lightest touch we could think of.</p>

</main>
Expand Down
6 changes: 3 additions & 3 deletions chapter_1/chapter_1_1.html
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ <h2 id="an-example"><a class="header" href="#an-example">An example</a></h2>
let probe = worker.dataflow(|scope|
scope.input_from(&amp;mut input)
.exchange(|x| *x)
.inspect(move |x| println!(&quot;worker {}:\thello {}&quot;, index, x))
.inspect(move |x| println!("worker {}:\thello {}", index, x))
.probe()
);

Expand Down Expand Up @@ -233,7 +233,7 @@ <h2 id="an-example"><a class="header" href="#an-example">An example</a></h2>
// we only need to test factors up to sqrt(x)
let limit = (*x as f64).sqrt() as u64;
if *x &gt; 1 &amp;&amp; (2 .. limit + 1).all(|i| x % i &gt; 0) {
println!(&quot;{} is prime&quot;, x);
println!("{} is prime", x);
}
})</code></pre>
<p>We don't really care that much about the order (we just want the results), and we have written such a simple primality test that we are going to be thrilled if we can distribute the work across multiple cores.</p>
Expand Down Expand Up @@ -272,7 +272,7 @@ <h2 id="an-example"><a class="header" href="#an-example">An example</a></h2>
<p>This is also a fine time to point out that dataflow programming is not religion. There is an important part of our program up above that is imperative:</p>
<pre><code class="language-ignore rust"> let limit = (*x as f64).sqrt() as u64;
if *x &gt; 1 &amp;&amp; (2 .. limit + 1).all(|i| x % i &gt; 0) {
println!(&quot;{} is prime&quot;, x);
println!("{} is prime", x);
}
</code></pre>
<p>This is an imperative fragment telling the <code>inspect</code> operator what to do. We <em>could</em> write this as a dataflow fragment if we wanted, but it is frustrating to do so, and less efficient. The control flow fragment lets us do something important, something that dataflow is bad at: the <code>all</code> method above <em>stops</em> as soon as it sees a factor of <code>x</code>.</p>
Expand Down
Loading

0 comments on commit d46b884

Please sign in to comment.