Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporated old dev-branch changes #27

Merged
merged 26 commits into from
Aug 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
c6f0437
Trace folder moved to Git LFS
cvetkovic Jul 1, 2022
0f725df
Resolve branch conflicts
HongyuHe Jul 21, 2022
c5f1d93
Apply suggestions
HongyuHe Jul 24, 2022
396e4cf
Fixed runtime and memory specification generator
cvetkovic Jul 29, 2022
daa7a0a
Token permissions in README
cvetkovic Jul 29, 2022
23bcfda
IAT distribution choice - not hardcoded equidistant anymore
cvetkovic Aug 2, 2022
2d3979c
traceload.go - IAT tick overflow bugfix
cvetkovic Aug 3, 2022
532beea
Force control plane placement on master
cvetkovic Aug 8, 2022
e9030f5
Trace plotter deployment fix
cvetkovic Aug 8, 2022
7539d24
Taint/untaint stdin bugfix
cvetkovic Aug 8, 2022
3a747f9
Put taint on master node on setup completion
cvetkovic Aug 8, 2022
a366316
Disable trace plotter setup on startup
cvetkovic Aug 8, 2022
da4f48d
Added gRPC instrumentation to trace-func
cvetkovic Aug 10, 2022
5f4ca1f
Propagate withTracing to invoker
cvetkovic Aug 10, 2022
8ecaa21
Updated yaml file
cvetkovic Aug 10, 2022
0821edf
Fixed YAML
cvetkovic Aug 10, 2022
c738794
Updated README
cvetkovic Aug 10, 2022
da1bbd0
Addressed Dmitrii's comments
cvetkovic Aug 10, 2022
68b3071
Fixed indentation
cvetkovic Aug 10, 2022
e43bd27
Renaming script
cvetkovic Aug 10, 2022
872c797
Add more descriptions and cover equality specs (again)
HongyuHe Aug 14, 2022
d852313
Apply the 2nd round suggestions
HongyuHe Aug 14, 2022
d5be7b9
add data race test for GenerateExecutionSpecs
sk1tter Aug 12, 2022
427549f
add mutex to protect against concurrent access to rand.Rand
sk1tter Aug 12, 2022
ce16e13
Fix CI tests
HongyuHe Aug 16, 2022
41d32c0
Apply 3rd-round suggestions
HongyuHe Aug 17, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/configs/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -657,3 +657,13 @@ tpA
CDF
UI
elasticsearch
faas
Xeon
Cgroups
uVMs
CIDR
timing
cgroups
noop
YAMLs
cgo
14 changes: 14 additions & 0 deletions .github/issue_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

## Version & Branch

## Expected Behaviour

## Actual Behaviour

## Steps to Reproduce the Problem

1.
2.
3.

## Additional Info
10 changes: 8 additions & 2 deletions .github/workflows/code-quality.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ on:
pull_request:
branches: [main]


env:
GOOS: linux
GO111MODULE: on
Expand All @@ -26,17 +27,22 @@ jobs:
needs: resolve-modules
runs-on: ubuntu-20.04
strategy:
matrix: ${{ fromJson(needs.resolve-modules.outputs.matrix) }}
matrix: { dir: ['cmd', 'pkg'] }
fail-fast: false
steps:
- name: Setup Go 1.18
uses: actions/setup-go@v3
with:
go-version: 1.18

- name: Install dependencies for cgo
run: sudo apt update && sudo apt install libsnmp-dev

- name: Checkout code into go module directory
uses: actions/checkout@v3

- name: Lint with golangci-lint
uses: golangci/[email protected]
with:
working-directory: ${{ matrix.workdir }}
working-directory: ${{ matrix.dir }}
args: --timeout 5m
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ analysis
.vscode/
.idea
tmp
data/out
data/traces

### CMake ###
CMakeLists.txt.user
Expand Down
136 changes: 118 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,71 +1,171 @@
# Loader

A load generator for rigorous scientific research on serverless computing based upon [faas-load-generator](https://github.com/eth-easl/faas-load-generator) and the example code of [vHive](https://github.com/ease-lab/vhive).
A load generator for benchmarking serverless systems.

## Create an cluster
## Pre-requisites

The experiments require a 2-socket server-grade node, running Linux (tested on Ubuntu 20, Intel Xeon). On CloudLab, one can choose the APT cluster `d430` node.

### Multi-node cluster

The master node should have at least two sockets, because, although we have isolated the loader with Cgroups, the isolation provided by our setup scripts is better achieved when it runs on one socket separate from the rest of the components running on master.

### Single-node cluster

This mode is only for debugging purposes, and there is no guarantees of isolation between the loader and the master-node components.
## Create a cluster

First, change the parameters (e.g., `GITHUB_TOKEN`) in the `script/setup.cfg` is necessary.
Github token needs the `repo` and `admin:public_key` permissions.

* For creating a multi-node K8s cluster (pure containers) with maximum 500 pods per node, run the following.
ustiugov marked this conversation as resolved.
Show resolved Hide resolved

```bash
$ bash ./scripts/setup/create_multinode_container_large.sh <master_node@IP> <worker_node@IP> ...
```

* For creating a multi-node K8s cluster (pure containers) with maximum 200 pods per node, run the following.

```bash
$ bash ./scripts/setup/create_multinode_container.sh <master_node@IP> <worker_node@IP> ...
ustiugov marked this conversation as resolved.
Show resolved Hide resolved
```

* For creating a multi-node vHive cluster (firecracker uVMs), run the following.

```bash
$ bash ./scripts/setup/create_multinode_firecracker.sh <master_node@IP> <worker_node@IP> ...
```

* Run the following for a single-node setup. The capacity of this node is below 100 pods.

```bash
$ bash ./scripts/setup/create_singlenode_stock_k8s.sh <master_node@IP>
```
HongyuHe marked this conversation as resolved.
Show resolved Hide resolved

### Check cluster health (on the master node)

Once the setup scripts are finished, we need to check if they have completed their jobs fully, because, e.g., there
might be race conditions where a few nodes are unavailable.

ustiugov marked this conversation as resolved.
Show resolved Hide resolved
* First, log out the status of the control plane components on the master node and monitoring deployments by running the
following command:

```bash
$ bash ./scripts/util/log_kn_status.sh
```

* If you see everything is `Running`, check if the cluster capacity sis stretched to the desired capacity by running the
following script:

```bash
$ bash ./scripts/util/check_node_capacity.sh
HongyuHe marked this conversation as resolved.
Show resolved Hide resolved
```

If you want to check
the [pod CIDR range](https://www.ibm.com/docs/en/cloud-private/3.1.2?topic=networking-kubernetes-network-model), run the
following

```bash
$ bash ./scripts/util/get_pod_cidr.sh
```

* Next, try deploying a function (`myfunc`) with the desired `scale` (i.e., the number of instances that must be active
at all times).

```bash
$ bash ./scripts/util/set_function_scale.sh <scale>
```

For creating a multi-node K8s cluster (pure containers), run the following.
* One should verify that the system was able to start the requested number of function instances, by using the following
command.

```bash
bash ./scripts/setup/create_multinode_container.sh <master_node@IP> <worker_node@IP> ...
$ kubectl -n default get podautoscalers
```

For creating a multi-node vHive cluster (firecracker uVMs), run the following.
## Tune the timing for the benchmark function

Before start any experiments, the timing of the benchmark function should be tuned so that it consumes the required service time more precisely.

First, run the following command to deploy the timing benchmark that yields the number of execution units* the function needs to run given a required service time.

* The execution unit is approximately 100 `SQRTSD` x86 instructions.

```bash
bash ./scripts/setup/create_multinode_firecracker.sh <master_node@IP> <worker_node@IP> ...
$ kubectl apply -f server/benchmark/timing.yaml
```

Run the following for a single-node setup.
Then, monitor and collect the `cold_iter_per_1ms` and `warm_iter_per_1ms` from the job logs as follows:
HongyuHe marked this conversation as resolved.
Show resolved Hide resolved

```bash
bash create_singlenode_stock_k8s.sh <master_node@IP>
$ watch kubectl logs timing
```

**NB**: The multinode setting support 500 pods per node, whilst the single node only support 100 pods by default and need to manually stretch the limit if you want (yet to be automated).
Finally, set the `COLD_ITER_PER_1MS` and `WARM_ITER_PER_1MS` in the function
template `workloads/container/trace_func_go.yaml` based on `cold_iter_per_1ms` and `warm_iter_per_1ms` respectively.

To explain this further, `cold_iter_per_1ms` is for short executions (<1s), and `warm_iter_per_1ms` is for the longer
ones (>=1s).

To account for difference in CPU performance set `COLD_ITER_PER_1MS=102` and `WARM_ITER_PER_1MS=115` if you are using Cloudlab XL170 machines. (Date of measurement: 10-Aug-2022)

## Single execution
HongyuHe marked this conversation as resolved.
Show resolved Hide resolved

In the Trace mode, the loader replays the Azure trace.
ustiugov marked this conversation as resolved.
Show resolved Hide resolved

For Trace mode, run the following command

```bash
cgexec -g cpuset,memory:loader-cg \
make ARGS='-sample <sample_trace_size> -duration <minutes[1,1440]> -cluster <num_workers> -server <trace|busy|sleep> -tracePath <path_to_trace> -warmup' run
make ARGS='-sample <sample_trace_size> -duration <minutes[1,1440]> -cluster <num_workers> -server <trace|busy|sleep> -tracePath <path_to_trace> -iatDistributionn <poission|uniform|equidistant> -warmup' run
```

In the RPS mode, the loader sweeps fixed number of invocations per second.
ustiugov marked this conversation as resolved.
Show resolved Hide resolved

When using RPS mode, run the following command

```bash
cgexec -g cpuset,memory:loader-cg \
make ARGS="-mode stress -start <initial_rps> -end <stop_rps> -step <rps_step> -slot <rps_step_in_seconds> -server <trace|busy|sleep> -totalFunctions <num_functions>" run 2>&1 | tee stress.log
```

NB: cgroups are for isolating the loader on master node from the control plane components.
ustiugov marked this conversation as resolved.
Show resolved Hide resolved

## Experiment

For running experiments, use the wrapper scripts in the `scripts/experiments` directory.

```bash
#* Trace mode
HongyuHe marked this conversation as resolved.
Show resolved Hide resolved
bash scripts/experiments/run_trace_mode.sh <duration_in_minutes> <num_workers> <trace_path>
$ bash scripts/experiments/run_trace_mode.sh \
<duration_in_minutes> <num_workers> <trace_path>

#* RPS mode
bash scripts/experiments/run_rps_mode.sh <start> <stop> <step> <duration_in_sec> <num_func> <wimpy|trace> <func_runtime> <func_mem>
$ bash scripts/experiments/run_rps_mode.sh \
<start> <stop> <step> <duration_in_sec> \
<num_func> <wimpy|trace> <func_runtime> <func_mem> \
<print-option: debug | info | all>
```

### Build the image for server function
## Build the image for a synthetic function

```sh
A synthetic function runs for a given time and allocates the required amount of memory.

* `trace-func` mode counts iterations for fulfilling the service time.
* `busy-wait` mode uses timer-based spin-lock to consume service time.
* `sleep` mode is a noop that does nothing but idle waits.

```bash
$ make build <trace-func|busy-wait|sleep>
```
### Update gRPC protocol

```sh
$ make proto
## Clean up between runs

```bash
$ make clean
```

---

For more details, please see the `Makefile`.
For more options, please see the `Makefile`.

Loading