This work should be completed before the exercise on Friday 24th March.
For instructions on how to do and submit the assignment, please see the assignments section of the course instructions.
Study the following course literature:
- Read the following from the Fundamentals of Concurrent Programming
Take a look at the program matching.go. Explain what happens and why it happens if you make the following changes. Try first to reason about it, and then test your hypothesis by changing and running the program.
- What happens if you remove the
go-command
from theSeek
call in themain
function? - What happens if you switch the declaration
wg := new(sync.WaitGroup
) tovar wg sync.WaitGroup
and the parameterwg *sync.WaitGroup
towg sync.WaitGroup
? - What happens if you remove the buffer on the channel match?
- What happens if you remove the default-case from the case-statement in the
main
function?
Hint: Think about the order of the instructions and what happens with arrays of different lengths.
The file julia.go contains a program that creates Julia Set fractal images and writes them to file. You can watch a quick explainer on the Julia Set and its relationship to the Mandelbrot Set here, to get an idea of the type of images that are produced.
To start with read through the code to get an idea of how it works. Some things to note:
-
var Funcs []ComplexFunc = []ComplexFunc{
on line 18 is an array of functions. Note how Go lets us iterate through this array and pass each function as an argument to theCreatePng
function in theMain
function (see: https://en.wikipedia.org/wiki/First-class_function). -
In the
CreatePng
function it is theJulia
function that creates the image and it does this by iterating through each pixel of the image to paint the final image. The actual calculation whether to paint a pixel occurs in theIterate
function in the blink and you miss it line 73:z = f(z)
, using the function from theFuncs
array that has been passed through several functions.
The program works, but is pretty slow. In order to get a sense of how long the
program takes, you can time it using a stop watch, or more conveniently using
the time
keyword in Bash:
$ time go run julia.go
go run julia.go 11.10s user 0.60s system 97% cpu 11.974 total
Your own system will have its own runtime, but in the example above we find that it completed in 11.10s with the CPU running at 97% load.
Your assignment is to find different ways to divide the computations so that they run in parallel on all available CPUs. Use the ideas from the example in the efficient parallel computation section of the course literature.
Please note that simply making everything concurrent with more and more Gorountines does not necessarily get you faster performance after a certain point (see https://en.wikipedia.org/wiki/Amdahl%27s_law), so think carefully how you split the problem up (hint: each individual pixel does not needs its own Goroutine).
Assistant's note: In more recent versions of Golang (since 1.5), the runtime will default to use as many operating system threads as it is allowed. To see differences in behaviour, refer to the GOMAXPROCS function and vary the value.
Make sure you add a comment in julia.go with your original runtime and your improved runtime.
In the final task, you will be applying the MapReduce model for improving a word frequency program.
Assistant's note: We actually struggled to find a nice introductory explanation beyond our own example below, as the full model has several more layers that we are skipping. If you are curious, check out the original Google research article and see how you get on! https://ai.google/research/pubs/pub62
A word frequency analysis of a document will return a summary of the word counts for all unique words in the document. Whilst this can be solved efficiently using a map data structure in a sequential program, the performance can be improved by parallelising the program.
Assistant's note: Typically we ignore case when counting the frequency so 'Hello', 'HELLO' and 'hello' are all counted as 'hello', so remember to convert all strings to lower case.
By splitting the document into sub-documents and conducting a partial count in parallel (Map task), we can arrive at the solution by combining all the partial results into a final result (Reduce task).
Read the code in singleworker/words.go and complete the missing parts:
- Implementation for the
WordCount
function - Reading a text file into a string in the
main
function - Check that the unittest passes
- Log the runtime performance in the table below
Once you are satisfied with the singleworker, move into mapreduce/words.go and parallelise the program in order to improve the performance.
- Update the
WordCount
function with the Map and Reduce tasks, using goroutines to parallelise and a channel to gather partial results - Check that the unittest passes
- Find the optimal amount of gorountines before you encounter diminishing returns in performance improvements
- Log the runtime performance in the table below
Variant | Runtime (ms) |
---|---|
singleworker | 713 |
mapreduce | 225 |
And with that, you are on your way to Google-scale problems ;-)