Skip to content

Commit

Permalink
compute/lab: Replace D code with C in race-condition
Browse files Browse the repository at this point in the history
For better understanding and reasons

Signed-off-by: cristian-vijelie <[email protected]>
  • Loading branch information
cristian-vijelie authored and teodutu committed Apr 20, 2023
1 parent 245bbf0 commit d0ee552
Show file tree
Hide file tree
Showing 14 changed files with 247 additions and 195 deletions.
2 changes: 1 addition & 1 deletion content/chapters/compute/lab/content/arena.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ For example, we cannot place it before a `mov` instruction, as the action of a `
Instead, we can place it in front of an `inc` instruction if its operand is memory.

Look at the code in `support/race-condition/asm/race_condition_lock.S`.
It's an Assembly equivalent of the code you've already seen many times so far (such as `support/race-condition/d/race_condition.d`).
It's an Assembly equivalent of the code you've already seen many times so far (such as `support/race-condition/c/race_condition.c`).
Assemble and run it a few times.
Notice the different results you get.

Expand Down
71 changes: 24 additions & 47 deletions content/chapters/compute/lab/content/synchronization.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ But this is up to the scheduler and is non-deterministic.
Such undefined behaviours can cripple the execution of a program if `var` is some critical variable.

Let's see this bug in action.
Go to `support/race-condition/d/race_condition.d`, compile and run the code a few times.
Go to `support/race-condition/c/race_condition.c`, compile and run the code a few times.
It spawns to threads that do exactly what we've talked about so far: one thread increments `var` 10 million times, while the other decrements it 10 million times.

As you can see from running the program, the differences between subsequent runs can be substantial.
Expand All @@ -41,26 +41,26 @@ A critical section is a piece of code that can only be executed by **one thread*
So we need some sort of _mutual exclusion mechanism_ so that when one thread runs the critical section, the other has to **wait** before entering it.
This mechanism is called a **mutex**, whose name comes from "mutual exclusion".

Go to `support/race-condition/d/race_condition_mutex.d` and notice the differences between this code and the buggy one.
We now use a `Mutex` variable which we `lock()` at the beginning of a critical section and we `unlock()` at the end.
Generally speaking `lock()`-ing a mutex makes a thread enter a critical section, while calling `unlock()` makes the thread leave said critical section.
Go to `support/race-condition/c/race_condition_mutex.c` and notice the differences between this code and the buggy one.
We now use a `pthread_mutex_t` variable which we `lock` at the beginning of a critical section and we `unlock` at the end.
Generally speaking `lock`-ing a mutex makes a thread enter a critical section, while calling `pthread_mutex_unlock()` makes the thread leave said critical section.
Therefore, as we said previously, the critical sections in our code are `var--` and `var++`.
Run the code multiple times to convince yourself that in the end, the value of `var` will always be 0.

Mutexes contain an internal variable which can be either 1 (locked) or 0 (unlocked).
When a thread calls `lock()`, it attempts to set that variable to 1.
When a thread calls `pthread_mutex_lock()`, it attempts to set that variable to 1.
If it was 0, the thread sets it to 1 and proceeds to execute the critical section.
Otherwise, it **suspends its execution** and waits until that variable is set to 0 again.

When calling `unlock()`, the internal variable is set to 0 and all waiting threads are woken up to try to acquire the mutex again.
**Be careful:** It is generally considered unsafe and [in many cases undefined behaviour](https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_lock.html) to call `unlock()` from a different thread than the one that acquired the lock.
When calling `pthread_mute_unlock()`, the internal variable is set to 0 and all waiting threads are woken up to try to acquire the mutex again.
**Be careful:** It is generally considered unsafe and [in many cases undefined behaviour](https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_lock.html) to call `pthread_mutex_unlock()` from a different thread than the one that acquired the lock.
So the general workflow should look something like this:

```text
within a single thread:
mutex.lock()
pthread_mutex_lock(&mutex)
// do atomic stuff
mutex.unlock()
pthread_mutex_unlock(&mutex)
```

### Synchronization - Overhead
Expand All @@ -74,11 +74,11 @@ Notice that those of `race_condition_mutex` are larger than those of `race_condi

The cause of this is that now when one thread is executing the critical section, the other has to wait and do nothing.
Waiting means changing its state from RUNNING to WAITING, which brings further overhead from the scheduler.
This latter overhead comes from the **context switch**s that is necessary for a thread to switch its state from RUNNING to WAITING and back.
This latter overhead comes from the **context switch** that is necessary for a thread to switch its state from RUNNING to WAITING and back.

### Practice: Wrap the Whole `for` Statements in Critical Sections

Move the calls to `lock()` and `unlock()` outside the `for` statements so that the critical sections become the entire statement.
Move the calls to `pthread_mutex_lock()` and `pthread_mutex_unlock()` outside the `for` statements so that the critical sections become the entire statement.
Measure the new time spent by the code and compare it with the execution times recorded when the critical sections were made up of only `var--` and `var++`.

[Quiz](../quiz/coarse-vs-granular-critical-section.md)
Expand All @@ -87,7 +87,7 @@ Measure the new time spent by the code and compare it with the execution times r

So now we know how to use mutexes.
And we know that mutexes work by using an internal variable that can be either 1 (locked) or 0 (unlocked).
But how does `lock()` actually set that variable to 1?
But how does `pthread_mutex_lock()` actually set that variable to 1?
How does it avoid a race condition in case another thread also wants to set it to 1?

We need a guarantee that anyone "touching" that variable does so "within its own critical section".
Expand All @@ -98,7 +98,7 @@ Modern processors are capable of _atomically_ accessing data, either for reads o
An atomic action is and indivisible sequence of operations that a thread runs without interference from others.
Concretely, before initiating an atomic transfer on one of its data buses, the CPU first makes sure all other transfers have ended, then **locks** the data bus by stalling all cores attempting to transfer data on it.
This way, one thread obtains **exclusive** access to the data bus while accessing data.
As a side note, the critical sections in `support/race-condition/race_condition_mutex.d` are also atomic once they are wrapped between calls to `lock()` and `unlock()`.
As a side note, the critical sections in `support/race-condition/c/race_condition_mutex.c` are also atomic once they are wrapped between calls to `pthread_mutex_lock()` and `pthread_mutex_unlock()`.

As with every hardware feature, the `x86` ISA exposes an instruction for atomic operations.
In particular this instruction is a **prefix**, called `lock`.
Expand All @@ -113,13 +113,18 @@ Compilers provide support for such hardware-level atomic operations.
GCC exposes [builtins](https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html) such as `__atomic_load()`, `__atomic_store()`, `__atomic_compare_exchange()` and many others.
All of them rely on the mechanism described above.

In D, this functionality is implemented in the `core.atomic` module.
Go to `support/race-condition/d/race_condition_atomic.d` and complete the function `decrementVar()`.
Go to `support/race-condition/c/race_condition_atomic.c` and complete the function `decrement_var()`.
Compile and run the code.
Now measure its running time against the mutex implementations.
It should be somewhere between `race_condition.d` and `race_condition_mutex.d`.
It should be somewhere between `race_condition.c` and `race_condition_mutex.c`.

So using the hardware support is more efficient, but it can only be leveraged for simple, individual instructions, such as loads and stores.
The C standard library also provides atomic data types.
Access to these variables can be done only by one thread at a time.
Go to `support/race-condition/c/race_condition_atomic2.c`, compile and run the code.
Now measure its running time against the other implementations.
Notice that the time is similar to `race_condition_atomic`.

So using the hardware support is more efficient, but it usually is leveraged only for simple, individual instructions, such as loads and stores.
And the fact that high-level languages also expose an API for atomic operations shows how useful these operations are for developers.

## Semaphores
Expand Down Expand Up @@ -245,38 +250,10 @@ As its name implies, this is a type of storage that is "owned" by individual thr
**Do not confuse it with copy-on-write**.
TLS pages are always duplicated when creating a new thread and their contents are re-initialised.

### Practice: D - TLS by Default

Take a look again at `support/race-condition/d/race_condition.d`, specifically at how `var` is declared:

```d
__gshared int var;
```

Have you wondered what the `__gshared` keyword does?
Well, for memory safety reasons, in D, all variables are by default **not shared** between threads.
We need to specifically ask the language to let us share a variable between threads.
We can do this using either the `__gshared` or `shared` keywords.
You've seed `shared` in `support/race-condition/d/race_condition_atomic.d`.

The difference between them is that `shared` only allows programmers read-modify-write the variable atomically, as we do in `support/race-condition/d/race_condition_atomic.d`.
Modify the `incrementVar()` function and increment `var` like you would any variable: `var++`.
Try to compile the code.
It fails.
The compiler is smart and tells you what to do instead:

```console
Error: read-modify-write operations are not allowed for `shared` variables
Use `core.atomic.atomicOp!"+="(var, 1)` instead
```

`__gshared` is a rawer version of `shared`.
It doesn't forbid anything.

### Practice: C - TLS on Demand

The perspective of C towards TLS is opposed to that of D: in C/C++ everything is shared by default.
This makes multithreading easier and more lightweight to implement than in D, because synchronization is left entirely up to the developer, at the cost of potential unsafety.
The perspective of C towards TLS is the following: everything is shared by default.
This makes multithreading easier and more lightweight to implement than in other languages, like D, because synchronization is left entirely up to the developer, at the cost of potential unsafety.

Of course we can specify that some data belongs to the TLS, by preceding the declaration of a variable with `__thread` keyword.
First, compile and run the code in `support/race-condition/c/race_condition_tls.c` a few times.
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
race_condition
race_condition_mutex
race_condition_atomic
race_condition_atomic2
race_condition_tls
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
BINARY = race_condition_tls
BINARIES = race_condition race_condition_tls race_condition_mutex race_condition_atomic race_condition_atomic2
LDLIBS = -lpthread
include ../../../../../../common/makefile/single.mk
include ../../../../../../common/makefile/multiple.mk
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
// SPDX-License-Identifier: BSD-3-Clause

#include <stdio.h>
#include <pthread.h>

#include "utils/utils.h"

#define NUM_ITER 10000000

static int val;

void *increment_var(void *arg)
{
(void)arg;

for (size_t i = 0; i < NUM_ITER; i++)
val++;

return NULL;
}

void *decrement_var(void *arg)
{
(void)arg;

for (size_t i = 0; i < NUM_ITER; i++)
val--;

return NULL;
}

int main(void)
{
int rc;
pthread_t tids[2];

rc = pthread_create(tids, NULL, increment_var, NULL);
DIE(rc < 0, "pthread_create");
rc = pthread_create(tids + 1, NULL, decrement_var, NULL);
DIE(rc < 0, "pthread_create");

rc = pthread_join(tids[0], NULL);
DIE(rc < 0, "pthread_join");
rc = pthread_join(tids[1], NULL);
DIE(rc < 0, "pthread_join");

printf("var = %d\n", val);

return 0;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
// SPDX-License-Identifier: BSD-3-Clause

#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>

#include "utils/utils.h"

#define NUM_ITER 10000000

static int val;

void *increment_var(void *arg)
{
(void)arg;

for (size_t i = 0; i < NUM_ITER; i++)
atomic_fetch_add(&val, 1);

return NULL;
}

void *decrement_var(void *arg)
{
(void)arg;

for (size_t i = 0; i < NUM_ITER; i++)
// TODO: Use `atomic_fetch_sub` to implement `var -= 1` atomically.
;

return NULL;
}

int main(void)
{
int rc;
pthread_t tids[2];

rc = pthread_create(tids, NULL, increment_var, NULL);
DIE(rc < 0, "pthread_create");
rc = pthread_create(tids + 1, NULL, decrement_var, NULL);
DIE(rc < 0, "pthread_create");

rc = pthread_join(tids[0], NULL);
DIE(rc < 0, "pthread_join");
rc = pthread_join(tids[1], NULL);
DIE(rc < 0, "pthread_join");

printf("var = %d\n", val);

return 0;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
// SPDX-License-Identifier: BSD-3-Clause

#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>

#include "utils/utils.h"

#define NUM_ITER 10000000

static atomic_int val;

void *increment_var(void *arg)
{
(void)arg;

for (size_t i = 0; i < NUM_ITER; i++)
val++;

return NULL;
}

void *decrement_var(void *arg)
{
(void)arg;

for (size_t i = 0; i < NUM_ITER; i++)
val--;

return NULL;
}

int main(void)
{
int rc;
pthread_t tids[2];

rc = pthread_create(tids, NULL, increment_var, NULL);
DIE(rc < 0, "pthread_create");
rc = pthread_create(tids + 1, NULL, decrement_var, NULL);
DIE(rc < 0, "pthread_create");

rc = pthread_join(tids[0], NULL);
DIE(rc < 0, "pthread_join");
rc = pthread_join(tids[1], NULL);
DIE(rc < 0, "pthread_join");

printf("var = %d\n", val);

return 0;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
// SPDX-License-Identifier: BSD-3-Clause

#include <stdio.h>
#include <pthread.h>

#include "utils/utils.h"

#define NUM_ITER 10000000

pthread_mutex_t mutex;

static int val;

void *increment_var(void *arg)
{
(void)arg;

// TODO: wrap the whole `for` statement in the critical section and measure
// the running times.
for (size_t i = 0; i < NUM_ITER; i++) {
pthread_mutex_lock(&mutex);
val++;
pthread_mutex_unlock(&mutex);
}

return NULL;
}

void *decrement_var(void *arg)
{
(void)arg;

// TODO: wrap the whole `for` statement in the critical section and measure
// the running times.
for (size_t i = 0; i < NUM_ITER; i++) {
pthread_mutex_lock(&mutex);
val--;
pthread_mutex_unlock(&mutex);
}

return NULL;
}

int main(void)
{
int rc;
pthread_t tids[2];

rc = pthread_create(tids, NULL, increment_var, NULL);
DIE(rc < 0, "pthread_create");
rc = pthread_create(tids + 1, NULL, decrement_var, NULL);
DIE(rc < 0, "pthread_create");

rc = pthread_join(tids[0], NULL);
DIE(rc < 0, "pthread_join");
rc = pthread_join(tids[1], NULL);
DIE(rc < 0, "pthread_join");

printf("var = %d\n", val);

return 0;
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/* SPDX-License-Identifier: BSD-3-Clause */
// SPDX-License-Identifier: BSD-3-Clause

#include <stddef.h>
#include <stdio.h>
Expand Down

This file was deleted.

Loading

0 comments on commit d0ee552

Please sign in to comment.