Skip to content

Commit

Permalink
Merge pull request #13 from ParsaLab/readme-fix
Browse files Browse the repository at this point in the history
README files are fixed.
  • Loading branch information
neo-apz committed Jan 16, 2016
2 parents b5cf49f + 4541679 commit 6d51597
Show file tree
Hide file tree
Showing 10 changed files with 133 additions and 124 deletions.
37 changes: 18 additions & 19 deletions benchmarks/data-analytics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,46 +12,46 @@ Supported tags and their respective `Dockerfile` links:

- [`Base`][basedocker]: This image contains the hadoop base which is needed for both master and slave images.
- [`Master`][masterdocker]: This image contains the main benchmark (hadoop master node, and mahout).
- [`Slave`][slavedocker]: This image contains the hadoop slave image.
- [`Slave`][slavedocker]: This image contains the hadoop slave image.
- [`Data`][datasetdocker]: This image contains the dataset used by the benchmark.

These images are automatically built using the mentioned Dockerfiles available on `cloudsuite/benchmarks/data-analytics/` [GitHub repo][repo].
These images are automatically built using the mentioned Dockerfiles available on `ParsaLab/cloudsuite` [GitHub repo][repo].

## Starting the volume image ##
This benchmark uses a Wikipedia dataset of ~30GB. We prepared a dataset image for training dataset, to download it once, and use it to run the benchmark. You can pull this image from Docker Hub.

$ docker pull cloudsuite/dataanalytics/dataset
$ docker pull cloudsuite/data-analytics:dataset

The following command will start the volume image, making the data available for other docker images on the host:

$ docker create --name data cloudsuite/dataanalytics/dataset
$ docker create --name data cloudsuite/data-analytics:dataset

## Starting the Master ##
To start the master you first have to `pull` the master image.

$ docker pull cloudsuite/dataanalytics/master
$ docker pull cloudsuite/data-analytics:master

Then, run the benchmark with the following command:

$ docker run -d -t --dns 127.0.0.1 -P --name master -h master.cloudsuite.com --volumes-from data cloudsuite/dataanalytics/master
$ docker run -d -t --dns 127.0.0.1 -P --name master -h master.cloudsuite.com --volumes-from data cloudsuite/data-analytics:master


## Starting the Slaves ##
If you want to have a single-node cluster, please skip this step.

To have more than one node, you need to start the slave containers. In order to do that, you first need to `pull` the slave image.

$ docker pull cloudsuite/dataanalytics/slave
$ docker pull cloudsuite/data-analytics:slave

To connect the slave containers to the master, you need the master IP.

$ FIRST_IP=$(docker inspect --format="{{.NetworkSettings.IPAddress}}" master)

Then, run as many slave containers as you want:

$ docker run -d -t --dns 127.0.0.1 -P --name slave$i -h slave$i.cloudsuite.com -e JOIN_IP=$FIRST_IP cloudsuite/dataanalytics/slave
$ docker run -d -t --dns 127.0.0.1 -P --name slave$i -h slave$i.cloudsuite.com -e JOIN_IP=$FIRST_IP cloudsuite/data-analytics:slave

Where `$i` is the slave number, you should start with 1 (i.e., slave1, slave1.cloudsuite.com, slave2, slave2.cloudsuite.com, ...).
Where `$i` is the slave number, you should start with 1 (i.e., slave1, slave1.cloudsuite.com, slave2, slave2.cloudsuite.com, ...).


## Running the benchmark ##
Expand All @@ -64,16 +64,15 @@ Then, run the benchmark with the following command:

$ ./run.sh

It asks you to enter the number of slaves, if you have a single-node cluster, please enter 0.
It asks you to enter the number of slaves, if you have a single-node cluster, please enter 0.
After entering the slave number, it prepares hadoop, downloads the dataset (it takes a lot of time to download this dataset) and runs the benchmark. After the benchmark finishes, the model will be available in HDFS, under the wikipediamodel directory.

[basedocker]: https://github.com/CloudSuite-EPFL/DataAnalytics/blob/master/Dockerfile "Base Dockerfile"
[masterdocker]: https://github.com/CloudSuite-EPFL/DataAnalytics/blob/master/Dockerfile "Master Dockerfile"
[slavedocker]: https://github.com/CloudSuite-EPFL/DataAnalytics/blob/master/Dockerfile "Slave Dockerfile"
[basedocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-analytics/master/Dockerfile "Base Dockerfile"
[masterdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-analytics/master/Dockerfile "Master Dockerfile"
[slavedocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-analytics/slave/Dockerfile "Slave Dockerfile"
[datasetdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-analytics/dataset/Dockerfile "Dataset Dockerfile"

[datasetdocker]: https://github.com/CloudSuite-EPFL/DataAnalytics/blob/master/dataset/Dockerfile "Dataset Dockerfile"

[repo]: https://github.com/ParsaLab/cloudsuite/tree/master/benchmarks/data-analytics "GitHub Repo"
[dhrepo]: https://hub.docker.com/r/cloudsuite/dataanalytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/dataanalytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/dataanalytics.svg "Go to DockerHub Page"
[repo]: https://github.com/ParsaLab/cloudsuite "GitHub Repo"
[dhrepo]: https://hub.docker.com/r/cloudsuite/data-analytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-analytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-analytics.svg "Go to DockerHub Page"
30 changes: 15 additions & 15 deletions benchmarks/data-caching/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Supported tags and their respective `Dockerfile` links:
- [`server`][serverdocker]: This represents the Memcached server running as a daemon.
- [`client`][clientdocker]: This represents the client which requests to access the server's data.

These images are automatically built using the mentioned Dockerfiles available on `CloudSuite-EPFL/DataCaching` [GitHub repo][repo].
These images are automatically built using the mentioned Dockerfiles available on `ParsaLab/cloudsuite` [GitHub repo][repo].

### Preparing a network between the server(s) and the client

Expand All @@ -31,32 +31,32 @@ We will attach the launched containers to this newly created docker network.
### Starting the Server ####
To start the server you have to first `pull` the server image and then run it. To `pull` the server image use the following command:

$ docker pull cloudsuite/datacaching:server
$ docker pull cloudsuite/data-caching:server

It takes some time to download the image, but this is only required the first time.
The following command will start the server with four threads and 4096MB of dedicated memory, with a minimal object size of 550 bytes listening on port 11211 as default:

$ docker run --name dc-server --net caching_network -d cloudsuite/datacaching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550

We assigned a name to this server to facilitate linking it with the client. We also used `--net` option to attach the container to our prepared network.
As mentioned before, you can have multiple instances of the Memcached server, just remember to give each of them a unique name. For example, the following commands create four Memcached server instances:

$ docker run --name dc-server1 --net caching_network -d cloudsuite/datacaching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server2 --net caching_network -d cloudsuite/datacaching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server3 --net caching_network -d cloudsuite/datacaching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server4 --net caching_network -d cloudsuite/datacaching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server1 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server2 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server3 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
$ docker run --name dc-server4 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550

### Starting the Client ####

To start the client you have to first `pull` the client image and then run it. To `pull` the server image use the following command:

$ docker pull cloudsuite/datacaching:client
$ docker pull cloudsuite/data-caching:client

It takes some time to download the image, but this is only required the first time.

To start the client container use the following command:

$ docker run -it --name dc-client --net caching_network cloudsuite/datacaching:client bash
$ docker run -it --name dc-client --net caching_network cloudsuite/data-caching:client bash

This boots up the client container and you'll be logged in as the `memcache` user. Note that by using the `--net` option, you can easily make these containers visible to each other.

Expand Down Expand Up @@ -133,11 +133,11 @@ and the client on different sockets of the same machine

[memcachedWeb]: http://memcached.org/ "Memcached Website"

[serverdocker]: https://github.com/CloudSuite-EPFL/DataCaching/blob/master/server/Dockerfile "Server Dockerfile"
[serverdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-caching/server/Dockerfile "Server Dockerfile"

[clientdocker]: https://github.com/CloudSuite-EPFL/DataCaching/blob/master/client/Dockerfile "Client Dockerfile"
[clientdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-caching/client/Dockerfile "Client Dockerfile"

[repo]: https://github.com/CloudSuite-EPFL/DataCaching "GitHub Repo"
[dhrepo]: https://hub.docker.com/r/cloudsuite/datacaching/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/datacaching.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/datacaching.svg "Go to DockerHub Page"
[repo]: https://github.com/ParsaLab/cloudsuite "GitHub Repo"
[dhrepo]: https://hub.docker.com/r/cloudsuite/data-caching/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-caching.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-caching.svg "Go to DockerHub Page"
4 changes: 2 additions & 2 deletions benchmarks/data-caching/client/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ This `Dockerfile` creates an ubuntu (latest) image representing the Memcached cl

Example:

$ docker pull cloudsuite/datacaching:client
$ docker run -it --name dc-client --link=dc-server cloudsuite/datacaching:client bash
$ docker pull cloudsuite/data-caching:client
$ docker run -it --name dc-client --net caching_network cloudsuite/data-caching:client bash
4 changes: 2 additions & 2 deletions benchmarks/data-caching/server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ This `Dockerfile` creates an ubuntu (latest) image containing the latest version
Memcached will be started as a daemon with the passed parameters.
Example:

$ docker pull cloudsuite/datacaching:server
$ docker run --name dc-server -d cloudsuite/datacaching:server -t 4 -m 4096 -n 550
$ docker pull cloudsuite/data-caching:server
$ docker run --name dc-server --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550
29 changes: 14 additions & 15 deletions benchmarks/data-serving/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ To facilitate the communication between the client and the server(s), we build a

We will attach the launched containers to this newly created docker network.

###Server Container
### Server Container
Start the server container:

$ docker run -it --name cassandra-server --net serving_network cloudsuite/dataserving:server bash
$ docker run -it --name cassandra-server --net serving_network cloudsuite/data-serving:server bash

In order to create a keyspace and a column family, you can use the following commands after connecting to the server with the cassandra-cli under the directory in which Cassandra is unpacked. (A link to a basic tutorial with cassandra-cli: http://wiki.apache.org/cassandra/CassandraCli)

Expand Down Expand Up @@ -49,30 +49,30 @@ You can use other commands in the cassandra-cli to verify the correctness of the
If you make a mistake you can use the *drop* command and try again:

$ drop keyspace usertable;
###Multiple Server Containers

### Multiple Server Containers

For a cluster setup with multiple servers, we need to instantiate a seed server:

```
$ docker run -it --name cassandra-server-seed --net serving_network cloudsuite/dataserving:server bash
$ docker run -it --name cassandra-server-seed --net serving_network cloudsuite/data-serving:server bash
```

Then we prepare the server as previously.

The other server containers are instantiated as follows:

```
$ docker run -it --name cassandra-server(id) --net serving_network -e CASSANDRA_SEEDS=cassandra-server-seed cloudsuite/dataserving:server bash
$ docker run -it --name cassandra-server(id) --net serving_network -e CASSANDRA_SEEDS=cassandra-server-seed cloudsuite/data-serving:server bash
```

You can find more details at the websites: http://wiki.apache.org/cassandra/GettingStarted and https://hub.docker.com/_/cassandra/.

###Client Container(s)
### Client Container(s)
After successfully creating the aforementioned schema, you are ready to benchmark with YCSB.
Start the client container:

$ docker run -it --name cassandra-client --link cassandra-server:server cloudsuite/dataserving:client bash
$ docker run -it --name cassandra-client --link cassandra-server:server cloudsuite/data-serving:client bash

Change to the ycsb directory:
```
Expand All @@ -86,7 +86,7 @@ or, for a "one seed - one normal server" setup:
```
$ export HOSTS="cassandra-server-seed,cassandra-server1"
```
Load dataset on ycsb:
Load dataset on ycsb:
```
$ ./bin/ycsb load cassandra-10 -p hosts=$HOSTS -P workloads/workloada
```
Expand All @@ -113,15 +113,14 @@ Running the benchmark
---------------------
After you install and run the server, install the YCSB framework files and populate Cassandra, you are one step away from running the benchmark. To specify the runtime parameters for the client, a good practice is to create a settings file. You can keep the important parameters (e.g., *target*, *threadcount*, *hosts*, *operationcount*, *recordcount*) in this file.

The *settings.dat* file defines the IP address(es) of the node(s) running Cassandra, in addition to the *recordcount* parameter (which should be less than or equal to the number specified in the data generation step to avoid potential errors).
The *settings.dat* file defines the IP address(es) of the node(s) running Cassandra, in addition to the *recordcount* parameter (which should be less than or equal to the number specified in the data generation step to avoid potential errors).

The *operationcount* parameter sets the number of operations to be executed on the data store.
The *operationcount* parameter sets the number of operations to be executed on the data store.

The *run.command* file takes the *settings.dat* file as an input and runs the following command:

$ /ycsb/bin/ycsb run cassandra-10 -p hosts=server -P /ycsb/workloads/workloada

[dhrepo]: https://hub.docker.com/r/cloudsuite/dataserving/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/dataserving.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/dataserving.svg "Go to DockerHub Page"

[dhrepo]: https://github.com/ParsaLab/cloudsuite/tree/master/benchmarks/data-serving "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-serving.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-serving.svg "Go to DockerHub Page"
33 changes: 18 additions & 15 deletions benchmarks/graph-analytics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,19 @@ Supported tags and their respective `Dockerfile` links:
- [`spark-worker`][sparkworkerdocker] This builds an image for the Spark worker node. You may spawn several workers.
- [`spark-client`][sparkclientdocker] This builds an image with the Spark client node. The client is used to start the benchmark.

These images are automatically built using the mentioned Dockerfiles available on [`CloudSuite-EPFL/GraphAnalytics`][repo] and [`CloudSuite-EPFL/spark-base`][sparkrepo].
These images are automatically built using the mentioned Dockerfiles available on [`ParsaLab/cloudsuite`][repo].

### Starting the volume images ###

The first step is to create the volume images that contain the binaries and the dataset of the Graph Analytics benchmark. First `pull` the volume images, using the following command:

$ docker pull cloudsuite/GraphAnalytics:data
$ docker pull cloudsuite/GraphAnalytics:benchmark
$ docker pull cloudsuite/graph-analytics:data
$ docker pull cloudsuite/graph-analytics:benchmark

The following command will start the volume images, making both the data and the binaries available for other docker images on the host:

$ docker create --name data cloudsuite/GraphAnalytics:data
$ docker create --name bench cloudsuite/GraphAnalytics:benchmark
$ docker create --name data cloudsuite/graph-analytics:data
$ docker create --name bench cloudsuite/graph-analytics:benchmark

### Starting the master node ###

Expand Down Expand Up @@ -71,13 +71,16 @@ To run the benchmark from the interactive container, use the following command:

$ bash /benchmark/graph_analytics/run_benchmark.sh

[benchmarkdocker]: https://github.com/CloudSuite-EPFL/GraphAnalytics/blob/master/benchmarks/Dockerfile "Benchmark volume Dockerfile"
[datadocker]: https://github.com/CloudSuite-EPFL/GraphAnalytics/blob/master/data/Dockerfile "Data volume Dockerfile"
[sparkmasterdocker]: https://github.com/CloudSuite-EPFL/spark-base/blob/master/spark-master/Dockerfile "Spark Master Node Dockerfile"
[sparkworkerdocker]: https://github.com/CloudSuite-EPFL/spark-base/blob/master/spark-worker/Dockerfile "Spark Worker Dockerfile"
[sparkclientdocker]: https://github.com/CloudSuite-EPFL/spark-base/blob/master/spark-client/Dockerfile "Spark Client Dockerfile"
[repo]: https://github.com/CloudSuite-EPFL/GraphAnalytics "Graph Analytics GitHub Repo"
[sparkrepo]: https://github.com/CloudSuite-EPFL/spark-base "Spark Base GitHub Repo"
[dhrepo]: https://hub.docker.com/r/cloudsuite/graphanalytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/graphanalytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/graphanalytics.svg "Go to DockerHub Page"
[benchmarkdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/graph-analytics/benchmark/Dockerfile "Benchmark volume Dockerfile"
[datadocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/graph-analytics/data/Dockerfile "Data volume Dockerfile"
[sparkmasterdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/spark-base/spark-master/Dockerfile "Spark Master Node Dockerfile"
[sparkworkerdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/spark-base/spark-worker/Dockerfile "Spark Worker Dockerfile"
[sparkclientdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/spark-base/spark-client/Dockerfile "Spark Client Dockerfile"
[repo]: https://github.com/ParsaLab/cloudsuite "GitHub Repo"
[dhrepo]: https://hub.docker.com/r/cloudsuite/graph-analytics/ "DockerHub Page"
[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/graph-analytics.svg "Go to DockerHub Page"
[dhstars]: https://img.shields.io/docker/stars/cloudsuite/graph-analytics.svg "Go to DockerHub Page"

[serverdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-caching/server/Dockerfile "Server Dockerfile"

[clientdocker]: https://github.com/ParsaLab/cloudsuite/blob/master/benchmarks/data-caching/client/Dockerfile "Client Dockerfile"
Loading

0 comments on commit 6d51597

Please sign in to comment.