From eb8224b423ca8fba0f7798af09efadf4ad8e2158 Mon Sep 17 00:00:00 2001 From: Cyan Lin Date: Sun, 1 Aug 2021 11:14:30 +0200 Subject: [PATCH] Update all files for the migration of CloudSuite 3. (#319) --- README.md | 4 +++ docs/benchmarks/data-analytics.md | 16 +++++----- docs/benchmarks/data-caching.md | 26 +++++++-------- docs/benchmarks/data-serving.md | 16 +++++----- docs/benchmarks/graph-analytics.md | 30 +++++++++--------- docs/benchmarks/in-memory-analytics.md | 43 +++++++++++++------------ docs/benchmarks/media-streaming.md | 26 +++++++-------- docs/benchmarks/web-search.md | 24 +++++++------- docs/benchmarks/web-serving.md | 36 ++++++++++----------- docs/commons/hadoop.md | 10 +++--- docs/commons/spark.md | 44 +++++++++++++------------- docs/datasets/movielens-dataset.md | 12 +++---- 12 files changed, 147 insertions(+), 140 deletions(-) diff --git a/README.md b/README.md index 715dd28b7..66b57b927 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,7 @@ # CloudSuite 3.0 # +**This branch is an archive where all CloudSuite 3.0 benchmarks are stored. All prebuilt images are available at [cloudsuite3][old] at dockerhub. If you're searching for CloudSuite 4.0, please checkout [master][master] branch.** + [CloudSuite][csp] is a benchmark suite for cloud services. The third release consists of eight applications that have been selected based on their popularity in today's datacenters. The benchmarks are based on real-world software stacks and represent real-world setups. @@ -26,3 +28,5 @@ We encourage CloudSuite users to use GitHub issues for requests for enhancements [csl]: http://cloudsuite.ch/pages/license/ "CloudSuite License" [csb]: http://cloudsuite.ch/#download "CloudSuite Benchmarks" [pkb]: https://github.com/GoogleCloudPlatform/PerfKitBenchmarker "Google's PerfKit Benchmarker" +[old]: https://hub.docker.com/orgs/cloudsuite3/repositories "CloudSuite3 on Dockerhub" +[master]: https://github.com/parsa-epfl/cloudsuite "CloudSuite Master" diff --git a/docs/benchmarks/data-analytics.md b/docs/benchmarks/data-analytics.md index a732e49b6..c648a3723 100644 --- a/docs/benchmarks/data-analytics.md +++ b/docs/benchmarks/data-analytics.md @@ -12,8 +12,8 @@ The benchmark consists of running a Naive Bayes classifier on a Wikimedia datase To obtain the images: ```bash -$ docker pull cloudsuite/hadoop -$ docker pull cloudsuite/data-analytics +$ docker pull cloudsuite3/hadoop +$ docker pull cloudsuite3/data-analytics ``` ## Running the benchmark ## @@ -30,16 +30,16 @@ Start the master with: ```bash $ docker run -d --net hadoop-net --name master --hostname master \ - cloudsuite/data-analytics master + cloudsuite3/data-analytics master ``` Start a number of slaves with: ```bash $ docker run -d --net hadoop-net --name slave01 --hostname slave01 \ - cloudsuite/hadoop slave + cloudsuite3/hadoop slave $ docker run -d --net hadoop-net --name slave02 --hostname slave02 \ - cloudsuite/hadoop slave + cloudsuite3/hadoop slave ... ``` @@ -51,6 +51,6 @@ Run the benchmark with: $ docker exec master benchmark ``` -[dhrepo]: https://hub.docker.com/r/cloudsuite/data-analytics/ "DockerHub Page" -[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-analytics.svg "Go to DockerHub Page" -[dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-analytics.svg "Go to DockerHub Page" +[dhrepo]: https://hub.docker.com/r/cloudsuite3/data-analytics/ "DockerHub Page" +[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/data-analytics.svg "Go to DockerHub Page" +[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/data-analytics.svg "Go to DockerHub Page" diff --git a/docs/benchmarks/data-caching.md b/docs/benchmarks/data-caching.md index 5ccc1c9f2..9d165ad2e 100644 --- a/docs/benchmarks/data-caching.md +++ b/docs/benchmarks/data-caching.md @@ -31,32 +31,32 @@ We will attach the launched containers to this newly created docker network. ### Starting the Server #### To start the server you have to first `pull` the server image and then run it. To `pull` the server image use the following command: - $ docker pull cloudsuite/data-caching:server + $ docker pull cloudsuite3/data-caching:server It takes some time to download the image, but this is only required the first time. The following command will start the server with four threads and 4096MB of dedicated memory, with a minimal object size of 550 bytes listening on port 11211 as default: - $ docker run --name dc-server --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550 + $ docker run --name dc-server --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550 We assigned a name to this server to facilitate linking it with the client. We also used `--net` option to attach the container to our prepared network. As mentioned before, you can have multiple instances of the Memcached server, just remember to give each of them a unique name. For example, the following commands create four Memcached server instances: - $ docker run --name dc-server1 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550 - $ docker run --name dc-server2 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550 - $ docker run --name dc-server3 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550 - $ docker run --name dc-server4 --net caching_network -d cloudsuite/data-caching:server -t 4 -m 4096 -n 550 + $ docker run --name dc-server1 --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550 + $ docker run --name dc-server2 --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550 + $ docker run --name dc-server3 --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550 + $ docker run --name dc-server4 --net caching_network -d cloudsuite3/data-caching:server -t 4 -m 4096 -n 550 ### Starting the Client #### To start the client you have to first `pull` the client image and then run it. To `pull` the server image use the following command: - $ docker pull cloudsuite/data-caching:client + $ docker pull cloudsuite3/data-caching:client It takes some time to download the image, but this is only required the first time. To start the client container use the following command: - $ docker run -it --name dc-client --net caching_network cloudsuite/data-caching:client bash + $ docker run -it --name dc-client --net caching_network cloudsuite3/data-caching:client bash This boots up the client container and you'll be logged in as the `memcache` user. Note that by using the `--net` option, you can easily make these containers visible to each other. @@ -133,11 +133,11 @@ and the client on different sockets of the same machine [memcachedWeb]: http://memcached.org/ "Memcached Website" - [serverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/data-caching/server/Dockerfile "Server Dockerfile" + [serverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/data-caching/server/Dockerfile "Server Dockerfile" - [clientdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/data-caching/client/Dockerfile "Client Dockerfile" + [clientdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/data-caching/client/Dockerfile "Client Dockerfile" [repo]: https://github.com/parsa-epfl/cloudsuite "GitHub Repo" - [dhrepo]: https://hub.docker.com/r/cloudsuite/data-caching/ "DockerHub Page" - [dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-caching.svg "Go to DockerHub Page" - [dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-caching.svg "Go to DockerHub Page" + [dhrepo]: https://hub.docker.com/r/cloudsuite3/data-caching/ "DockerHub Page" + [dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/data-caching.svg "Go to DockerHub Page" + [dhstars]: https://img.shields.io/docker/stars/cloudsuite3/data-caching.svg "Go to DockerHub Page" diff --git a/docs/benchmarks/data-serving.md b/docs/benchmarks/data-serving.md index 53525d95c..9df2679c1 100644 --- a/docs/benchmarks/data-serving.md +++ b/docs/benchmarks/data-serving.md @@ -21,14 +21,14 @@ We will attach the launched containers to this newly created docker network. Start the server container that will run cassandra server and installs a default keyspace usertable: ```bash -$ docker run --name cassandra-server --net serving_network cloudsuite/data-serving:server cassandra +$ docker run --name cassandra-server --net serving_network cloudsuite3/data-serving:server cassandra ``` ### Multiple Server Containers For a cluster setup with multiple servers, we need to instantiate a seed server: ```bash -$ docker run --name cassandra-server-seed --net serving_network cloudsuite/data-serving:server +$ docker run --name cassandra-server-seed --net serving_network cloudsuite3/data-serving:server ``` Then we prepare the server as previously. @@ -36,7 +36,7 @@ Then we prepare the server as previously. The other server containers are instantiated as follows: ```bash -$ docker run --name cassandra-server(id) --net serving_network -e CASSANDRA_SEEDS=cassandra-server-seed cloudsuite/data-serving:server +$ docker run --name cassandra-server(id) --net serving_network -e CASSANDRA_SEEDS=cassandra-server-seed cloudsuite3/data-serving:server ``` You can find more details at the websites: http://wiki.apache.org/cassandra/GettingStarted and https://hub.docker.com/_/cassandra/. @@ -46,7 +46,7 @@ After successfully creating the aforementioned schema, you are ready to benchmar Start the client container specifying server name(s), or IP address(es), separated with commas, as the last command argument: ```bash -$ docker run --name cassandra-client --net serving_network cloudsuite/data-serving:client "cassandra-server-seed,cassandra-server1" +$ docker run --name cassandra-client --net serving_network cloudsuite3/data-serving:client "cassandra-server-seed,cassandra-server1" ``` More detailed instructions on generating the dataset can be found in Step 5 at [this](http://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload) link. Although Step 5 in the link describes the data loading procedure, other steps (e.g., 1, 2, 3, 4) are very useful to understand the YCSB settings. @@ -71,9 +71,9 @@ Running the benchmark --------------------- The benchmark is run automatically with the client container. One can modify the record count in the database and/or the number of operations performed by the benchmark specifying the corresponding variables when running the client container: ```bash -$ docker run -e RECORDCOUNT=<#> -e OPERATIONCOUNT=<#> --name cassandra-client --net serving_network cloudsuite/data-serving:client "cassandra-server-seed,cassandra-server1" +$ docker run -e RECORDCOUNT=<#> -e OPERATIONCOUNT=<#> --name cassandra-client --net serving_network cloudsuite3/data-serving:client "cassandra-server-seed,cassandra-server1" ``` -[dhrepo]: https://hub.docker.com/r/cloudsuite/data-serving/ "DockerHub Page" -[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/data-serving.svg "Go to DockerHub Page" -[dhstars]: https://img.shields.io/docker/stars/cloudsuite/data-serving.svg "Go to DockerHub Page" +[dhrepo]: https://hub.docker.com/r/cloudsuite3/data-serving/ "DockerHub Page" +[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/data-serving.svg "Go to DockerHub Page" +[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/data-serving.svg "Go to DockerHub Page" diff --git a/docs/benchmarks/graph-analytics.md b/docs/benchmarks/graph-analytics.md index 282c8b186..465b9896b 100644 --- a/docs/benchmarks/graph-analytics.md +++ b/docs/benchmarks/graph-analytics.md @@ -11,13 +11,13 @@ The Graph Analytics benchmark relies the Spark framework to perform graph analyt Current version of the benchmark is 3.0. To obtain the image: - $ docker pull cloudsuite/graph-analytics + $ docker pull cloudsuite3/graph-analytics ### Datasets The benchmark uses a graph dataset generated from Twitter. To get the dataset image: - $ docker pull cloudsuite/twitter-dataset-graph + $ docker pull cloudsuite3/twitter-dataset-graph More information about the dataset is available at [cloudsuite/twitter-dataset-graph][ml-dhrepo]. @@ -30,8 +30,8 @@ spark-submit. To run a benchmark with the Twitter dataset: - $ docker create --name data cloudsuite/twitter-dataset-graph - $ docker run --rm --volumes-from data cloudsuite/graph-analytics + $ docker create --name data cloudsuite3/twitter-dataset-graph + $ docker run --rm --volumes-from data cloudsuite3/graph-analytics ### Tweaking the Benchmark @@ -41,7 +41,7 @@ has enough memory allocated to be able to execute the benchmark in-memory, supply it with --driver-memory and --executor-memory arguments: - $ docker run --rm --volumes-from data cloudsuite/graph-analytics \ + $ docker run --rm --volumes-from data cloudsuite3/graph-analytics \ --driver-memory 1g --executor-memory 4g ### Multi-node deployment @@ -54,30 +54,30 @@ with Docker look at [cloudsuite/spark][spark-dhrepo]. First, create a dataset image on every physical node where Spark workers will be running. - $ docker create --name data cloudsuite/twitter-dataset-graph + $ docker create --name data cloudsuite3/twitter-dataset-graph Start Spark master and Spark workers. They should all run within the same Docker network, which we call spark-net here. The workers get access to the datasets with --volumes-from data. $ docker run -dP --net spark-net --hostname spark-master --name spark-master \ - cloudsuite/spark master + cloudsuite3/spark master $ docker run -dP --net spark-net --volumes-from data --name spark-worker-01 \ - cloudsuite/spark worker spark://spark-master:7077 + cloudsuite3/spark worker spark://spark-master:7077 $ docker run -dP --net spark-net --volumes-from data --name spark-worker-02 \ - cloudsuite/spark worker spark://spark-master:7077 + cloudsuite3/spark worker spark://spark-master:7077 $ ... Finally, run the benchmark as the client to the Spark master: $ docker run --rm --net spark-net --volumes-from data \ - cloudsuite/graph-analytics \ + cloudsuite3/graph-analytics \ --driver-memory 1g --executor-memory 4g \ --master spark://spark-master:7077 -[dhrepo]: https://hub.docker.com/r/cloudsuite/graph-analytics/ "DockerHub Page" -[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/graph-analytics.svg "Go to DockerHub Page" -[dhstars]: https://img.shields.io/docker/stars/cloudsuite/graph-analytics.svg "Go to DockerHub Page" -[ml-dhrepo]: https://hub.docker.com/r/cloudsuite/twitter-dataset-graph/ -[spark-dhrepo]: https://hub.docker.com/r/cloudsuite/spark/ +[dhrepo]: https://hub.docker.com/r/cloudsuite3/graph-analytics/ "DockerHub Page" +[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/graph-analytics.svg "Go to DockerHub Page" +[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/graph-analytics.svg "Go to DockerHub Page" +[ml-dhrepo]: https://hub.docker.com/r/cloudsuite3/twitter-dataset-graph/ +[spark-dhrepo]: https://hub.docker.com/r/cloudsuite3/spark/ diff --git a/docs/benchmarks/in-memory-analytics.md b/docs/benchmarks/in-memory-analytics.md index d3bc3aed0..8c42610c2 100644 --- a/docs/benchmarks/in-memory-analytics.md +++ b/docs/benchmarks/in-memory-analytics.md @@ -22,17 +22,17 @@ squares (ALS) algorithm which is provided by Spark MLlib. Current version of the benchmark is 3.0. To obtain the image: - $ docker pull cloudsuite/in-memory-analytics + $ docker pull cloudsuite3/in-memory-analytics ### Datasets The benchmark uses user-movie ratings datasets provided by Movielens. To get the dataset image: - $ docker pull cloudsuite/movielens-dataset + $ docker pull cloudsuite3/movielens-dataset More information about the dataset is available at -[cloudsuite/movielens-dataset][ml-dhrepo]. +[cloudsuite3/movielens-dataset][ml-dhrepo]. ### Running the Benchmark @@ -41,14 +41,14 @@ distributed with Spark. It takes two arguments: the dataset to use for training, and the personal ratings file to give recommendations for. Any remaining arguments are passed to spark-submit. -The cloudsuite/movielens-dataset image has two datasets (one small and one +The cloudsuite3/movielens-dataset image has two datasets (one small and one large), and a sample personal ratings file. To run a benchmark with the small dataset and the provided personal ratings file: - $ docker create --name data cloudsuite/movielens-dataset - $ docker run --rm --volumes-from data cloudsuite/in-memory-analytics \ + $ docker create --name data cloudsuite3/movielens-dataset + $ docker run --rm --volumes-from data cloudsuite3/in-memory-analytics \ /data/ml-latest-small /data/myratings.csv ### Tweaking the Benchmark @@ -58,7 +58,7 @@ be used to tweak execution. For example, to ensure that Spark has enough memory allocated to be able to execute the benchmark in-memory, supply it with --driver-memory and --executor-memory arguments: - $ docker run --rm --volumes-from data cloudsuite/in-memory-analytics \ + $ docker run --rm --volumes-from data cloudsuite3/in-memory-analytics \ /data/ml-latest /data/myratings.csv \ --driver-memory 2g --executor-memory 2g @@ -67,32 +67,35 @@ allocated to be able to execute the benchmark in-memory, supply it with This section explains how to run the benchmark using multiple Spark workers (each running in a Docker container) that can be spread across multiple nodes in a cluster. For more information on running Spark with Docker look at -[cloudsuite/spark][spark-dhrepo]. +[cloudsuite3/spark][spark-dhrepo]. First, create a dataset image on every physical node where Spark workers will be running. - $ docker create --name data cloudsuite/movielens-dataset + $ docker create --name data cloudsuite3/movielens-dataset + +Then, create dedicated network for spark workers: + + $ docker network create spark-net Start Spark master and Spark workers. They should all run within the same -Docker network, which we call spark-net here. The workers get access to the -datasets with --volumes-from data. +Docker network, which we call spark-net here. The workers get access to the datasets with --volumes-from data. - $ docker run -dP --net spark-net --hostname spark-master --name spark-master cloudsuite/spark master - $ docker run -dP --net spark-net --volumes-from data --name spark-worker-01 cloudsuite/spark worker \ + $ docker run -dP --net spark-net --hostname spark-master --name spark-master cloudsuite3/spark master + $ docker run -dP --net spark-net --volumes-from data --name spark-worker-01 cloudsuite3/spark worker \ spark://spark-master:7077 - $ docker run -dP --net spark-net --volumes-from data --name spark-worker-02 cloudsuite/spark worker \ + $ docker run -dP --net spark-net --volumes-from data --name spark-worker-02 cloudsuite3/spark worker \ spark://spark-master:7077 $ ... Finally, run the benchmark as the client to the Spark master: - $ docker run --rm --net spark-net --volumes-from data cloudsuite/in-memory-analytics \ + $ docker run --rm --net spark-net --volumes-from data cloudsuite3/in-memory-analytics \ /data/ml-latest-small /data/myratings.csv --master spark://spark-master:7077 -[dhrepo]: https://hub.docker.com/r/cloudsuite/in-memory-analytics/ "DockerHub Page" -[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/in-memory-analytics.svg "Go to DockerHub Page" -[dhstars]: https://img.shields.io/docker/stars/cloudsuite/in-memory-analytics.svg "Go to DockerHub Page" -[ml-dhrepo]: https://hub.docker.com/r/cloudsuite/movielens-dataset/ -[spark-dhrepo]: https://hub.docker.com/r/cloudsuite/spark/ +[dhrepo]: https://hub.docker.com/r/cloudsuite3/in-memory-analytics/ "DockerHub Page" +[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/in-memory-analytics.svg "Go to DockerHub Page" +[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/in-memory-analytics.svg "Go to DockerHub Page" +[ml-dhrepo]: https://hub.docker.com/r/cloudsuite3/movielens-dataset/ +[spark-dhrepo]: https://hub.docker.com/r/cloudsuite3/spark/ diff --git a/docs/benchmarks/media-streaming.md b/docs/benchmarks/media-streaming.md index a098512cd..9f71312ae 100644 --- a/docs/benchmarks/media-streaming.md +++ b/docs/benchmarks/media-streaming.md @@ -24,11 +24,11 @@ The streaming server requires a video dataset to serve. We generate a synthetic To set up the dataset you have to first `pull` the dataset image and then run it. To `pull` the dataset image use the following command: - $ docker pull cloudsuite/media-streaming:dataset + $ docker pull cloudsuite3/media-streaming:dataset The following command will create a dataset container that exposes the video dataset volume, which will be used by the streaming server: - $ docker create --name streaming_dataset cloudsuite/media-streaming:dataset + $ docker create --name streaming_dataset cloudsuite3/media-streaming:dataset ### Creating a network between the server and the client(s) @@ -42,34 +42,34 @@ We will attach the launched containers to this newly created docker network. ### Starting the Server #### To start the server you have to first `pull` the server image and then run it. To `pull` the server image use the following command: - $ docker pull cloudsuite/media-streaming:server + $ docker pull cloudsuite3/media-streaming:server The following command will start the server, mount the dataset volume, and attach it to the *streaming_network* network: - $ docker run -d --name streaming_server --volumes-from streaming_dataset --net streaming_network cloudsuite/media-streaming:server + $ docker run -d --name streaming_server --volumes-from streaming_dataset --net streaming_network cloudsuite3/media-streaming:server ### Starting the Client #### To start the client you have to first `pull` the client image and then run it. To `pull` the client image use the following command: - $ docker pull cloudsuite/media-streaming:client + $ docker pull cloudsuite3/media-streaming:client To start the client container and connect it to the *streaming_network* network use the following command: - $ docker run -t --name=streaming_client -v /path/to/output:/output --volumes-from streaming_dataset --net streaming_network cloudsuite/media-streaming:client streaming_server + $ docker run -t --name=streaming_client -v /path/to/output:/output --volumes-from streaming_dataset --net streaming_network cloudsuite3/media-streaming:client streaming_server The client will issue a mix of requests for different videos of various qualities and performs a binary search of experiments to find the peak request rate the client can sustain while keeping the failure rate acceptable. At the end of client's execution, the resulting log files can be found under /output directory of the container, which you can map to a directory on the host using `-v /path/to/output:/output`. - [datasetdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/media-streaming/dataset/Dockerfile "Dataset Dockerfile" + [datasetdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/media-streaming/dataset/Dockerfile "Dataset Dockerfile" - [serverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/media-streaming/server/Dockerfile "Server Dockerfile" + [serverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/media-streaming/server/Dockerfile "Server Dockerfile" - [clientdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/media-streaming/client/Dockerfile "Client Dockerfile" + [clientdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/media-streaming/client/Dockerfile "Client Dockerfile" - [repo]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/media-streaming "GitHub Repo" - [dhrepo]: https://hub.docker.com/r/cloudsuite/media-streaming/ "DockerHub Page" - [dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/media-streaming.svg "Go to DockerHub Page" - [dhstars]: https://img.shields.io/docker/stars/cloudsuite/media-streaming.svg "Go to DockerHub Page" + [repo]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/media-streaming "GitHub Repo" + [dhrepo]: https://hub.docker.com/r/cloudsuite3/media-streaming/ "DockerHub Page" + [dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/media-streaming.svg "Go to DockerHub Page" + [dhstars]: https://img.shields.io/docker/stars/cloudsuite3/media-streaming.svg "Go to DockerHub Page" [nginx_repo]: https://github.com/nginx/nginx "Nginx repo" [httperf_repo]: https://github.com/httperf/httperf "httperf repo" diff --git a/docs/benchmarks/web-search.md b/docs/benchmarks/web-search.md index 92312ac1c..72b236eae 100644 --- a/docs/benchmarks/web-search.md +++ b/docs/benchmarks/web-search.md @@ -16,7 +16,7 @@ Supported tags and their respective `Dockerfile` links: - [`server`][serverdocker] This builds an image for the Apache Solr index nodes. You may spawn several nodes. - [`client`][clientdocker] This builds an image with the client node. The client is used to start the benchmark and query the index nodes. -These images are automatically built using the mentioned Dockerfiles available on [`https://github.com/parsa-epfl/cloudsuite/tree/master/benchmarks/web-search`][repo]. +These images are automatically built using the mentioned Dockerfiles available on [`https://github.com/parsa-epfl/cloudsuite/tree/CSv3/benchmarks/web-search`][repo]. ### Creating a network between the server(s) and the client(s) @@ -30,11 +30,11 @@ We will attach the launched containers to this newly created docker network. To start the server you have to first `pull` the server image and then run it. To `pull` the server image, use the following command: - $ docker pull cloudsuite/web-search:server + $ docker pull cloudsuite3/web-search:server The following command will start the server and forward port 8983 to the host, so that the Apache Solr's web interface can be accessed from the web browser using the host's IP address. More information on Apache Solr's web interface can be found [here][solrui]. The first parameter past to the image indicates the memory allocated for the JAVA process. The pregenerated Solr index occupies 12GB of memory, and therefore we use `12g` to avoid disk accesses. The second parameter indicates the number of Solr nodes. Because the index is for a single node only, the aforesaid parameter should be `1` always. - $ docker run -it --name server --net search_network -p 8983:8983 cloudsuite/web-search:server 12g 1 + $ docker run -it --name server --net search_network -p 8983:8983 cloudsuite3/web-search:server 12g 1 At the end of the server booting process, the container prints the `server_address` of the index node. This address is used in the client container. The `server_address` message in the container should look like this (note that the IP address might change): @@ -44,11 +44,11 @@ At the end of the server booting process, the container prints the `server_addre To start a client you have to first `pull` the client image and then run it. To `pull` the client image, use the following command: - $ docker pull cloudsuite/web-search:client + $ docker pull cloudsuite3/web-search:client The following command will start the client node and run the benchmark. The `server_address` refers to the IP address, in brackets (e.g., "172.19.0.2"), of the index node that receives the client requests. The four numbers after the server address refer to: the scale, which indicates the number of concurrent clients (50); the ramp-up time in seconds (90), which refers to the time required to warm up the server; the ramp-down time in seconds (60), which refers to the time to wait before ending the benchmark; and the steady-state time in seconds (60), which indicates the time the benchmark is in the steady state. Tune these parameters accordingly to stress your target system. - $ docker run -it --name client --net search_network cloudsuite/web-search:client server_address 50 90 60 60 + $ docker run -it --name client --net search_network cloudsuite3/web-search:client server_address 50 90 60 60 The output results will show on the screen after the benchmark finishes. @@ -88,15 +88,15 @@ The output results will show on the screen after the benchmark finishes. More information about Solr can be found [here][solrmanual]. -[datadocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/web-search/data/Dockerfile "Data volume Dockerfile" -[serverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/web-search/server/Dockerfile "Server Dockerfile" -[clientdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/web-search/client/Dockerfile "Client Dockerfile" +[datadocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/web-search/data/Dockerfile "Data volume Dockerfile" +[serverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/web-search/server/Dockerfile "Server Dockerfile" +[clientdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/web-search/client/Dockerfile "Client Dockerfile" [solrui]: https://cwiki.apache.org/confluence/display/solr/Overview+of+the+Solr+Admin+UI "Apache Solr UI" [solrmanual]: https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide "Apache Solr Manual" [nutchtutorial]: https://wiki.apache.org/nutch/NutchTutorial "Nutch Tutorial" [apachesolr]: https://github.com/apache/solr "Apache Solr" [apachenutch]: https://github.com/apache/nutch "Apache Nutch" -[repo]: https://github.com/parsa-epfl/cloudsuite/tree/master/benchmarks/web-search "Web Search GitHub Repo" -[dhrepo]: https://hub.docker.com/r/cloudsuite/web-search/ "DockerHub Page" -[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/web-search.svg "Go to DockerHub Page" -[dhstars]: https://img.shields.io/docker/stars/cloudsuite/web-search.svg "Go to DockerHub Page" +[repo]: https://github.com/parsa-epfl/cloudsuite/tree/CSv3/benchmarks/web-search "Web Search GitHub Repo" +[dhrepo]: https://hub.docker.com/r/cloudsuite3/web-search/ "DockerHub Page" +[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/web-search.svg "Go to DockerHub Page" +[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/web-search.svg "Go to DockerHub Page" diff --git a/docs/benchmarks/web-serving.md b/docs/benchmarks/web-serving.md index 9b5d73c8b..43ac6cb25 100644 --- a/docs/benchmarks/web-serving.md +++ b/docs/benchmarks/web-serving.md @@ -24,59 +24,59 @@ These images are automatically built using the mentioned Dockerfiles available o ### Starting the database server #### To start the database server, you have to first `pull` the server image. To `pull` the server image use the following command: - $ docker pull cloudsuite/web-serving:db_server + $ docker pull cloudsuite3/web-serving:db_server The following command will start the database server: - $ docker run -dt --net=host --name=mysql_server cloudsuite/web-serving:db_server ${WEB_SERVER_IP} + $ docker run -dt --net=host --name=mysql_server cloudsuite3/web-serving:db_server ${WEB_SERVER_IP} The ${WEB_SERVER_IP} parameter is mandatory. It sets the IP of the web server. If you are using the host network, the web server IP is the IP of the machine that you are running the web_server container on. If you create your own network you can use the name that you are going to give to the web server (we called it web_server in the following commands). ### Starting the memcached server #### To start the memcached server, you have to first `pull` the server image. To `pull` the server image use the following command: - $ docker pull cloudsuite/web-serving:memcached_server + $ docker pull cloudsuite3/web-serving:memcached_server The following command will start the memcached server: - $ docker run -dt --net=host --name=memcache_server cloudsuite/web-serving:memcached_server + $ docker run -dt --net=host --name=memcache_server cloudsuite3/web-serving:memcached_server ### Starting the web server #### To start the web server, you first have to `pull` the server image. To `pull` the server image use the following command: - $ docker pull cloudsuite/web-serving:web_server + $ docker pull cloudsuite3/web-serving:web_server To run the web server *without HHVM*, use the following command: - $ docker run -dt --net=host --name=web_server cloudsuite/web-serving:web_server /etc/bootstrap.sh ${DATABASE_SERVER_IP} ${MEMCACHED_SERVER_IP} ${MAX_PM_CHILDREN} + $ docker run -dt --net=host --name=web_server cloudsuite3/web-serving:web_server /etc/bootstrap.sh ${DATABASE_SERVER_IP} ${MEMCACHED_SERVER_IP} ${MAX_PM_CHILDREN} To run the web server *with HHVM enabled*, use the following command: - $ docker run -e "HHVM=true" -dt --net=host --name=web_server_local cloudsuite/web-serving:web_server /etc/bootstrap.sh ${DATABASE_SERVER_IP} ${MEMCACHED_SERVER_IP} ${MAX_PM_CHILDREN} + $ docker run -e "HHVM=true" -dt --net=host --name=web_server_local cloudsuite3/web-serving:web_server /etc/bootstrap.sh ${DATABASE_SERVER_IP} ${MEMCACHED_SERVER_IP} ${MAX_PM_CHILDREN} -The three ${DATABASE_SERVER_IP},${MEMCACHED_SERVER_IP}, and ${MAX_PM_CHILDREN} parameters are optional. The ${DATABASE_SERVER_IP}, and ${MEMCACHED_SERVER_IP} show the IP (or the container name) of the database server, and the IP (or the container name) of the memcached server, respectively. For example, if you are running all the containers on the same machine and use the host network you can use the localhost IP (127.0.0.1). Their default values are mysql_server, and memcache_server, respectively, which are the default names of the containers. +The three \${DATABASE_SERVER_IP},\${MEMCACHED_SERVER_IP}, and \${MAX_PM_CHILDREN} parameters are optional. The ${DATABASE_SERVER_IP}, and ${MEMCACHED_SERVER_IP} show the IP (or the container name) of the database server, and the IP (or the container name) of the memcached server, respectively. For example, if you are running all the containers on the same machine and use the host network you can use the localhost IP (127.0.0.1). Their default values are mysql_server, and memcache_server, respectively, which are the default names of the containers. The ${MAX_PM_CHILDREN} set the pm.max_children in the php-fpm setting. The default value is 80. ### Running the benchmark ### First `pull` the client image use the following command: - $ docker pull cloudsuite/web-serving:faban_client + $ docker pull cloudsuite3/web-serving:faban_client To start the client container which runs the benchmark, use the following commands: - $ docker run --net=host --name=faban_client cloudsuite/web-serving:faban_client ${WEB_SERVER_IP} ${LOAD_SCALE} + $ docker run --net=host --name=faban_client cloudsuite3/web-serving:faban_client ${WEB_SERVER_IP} ${LOAD_SCALE} The last command has a mandatory parameter to set the IP of the web_server, and an optional parameter to set the load scale (default is 7). The last command will output the summary of the benchmark results in XML at the end of the output. You can also access the summary and logs of the run by mounting the `/faban/output` directory of the container in the host filesystem (e.g. `-v /host/path:/faban/output`). - [webserverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/web-serving/web_server/Dockerfile "WebServer Dockerfile" - [memcacheserverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/web-serving/memcached_server/Dockerfile "MemcacheServer Dockerfile" - [mysqlserverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/web-serving/db_server/Dockerfile "MysqlServer Dockerfile" - [clientdocker]: https://github.com/parsa-epfl/cloudsuite/blob/master/benchmarks/web-serving/faban_client/Dockerfile "Client Dockerfile" + [webserverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/web-serving/web_server/Dockerfile "WebServer Dockerfile" + [memcacheserverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/web-serving/memcached_server/Dockerfile "MemcacheServer Dockerfile" + [mysqlserverdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/web-serving/db_server/Dockerfile "MysqlServer Dockerfile" + [clientdocker]: https://github.com/parsa-epfl/cloudsuite/blob/CSv3/benchmarks/web-serving/faban_client/Dockerfile "Client Dockerfile" - [repo]: https://github.com/parsa-epfl/cloudsuite/tree/master/benchmarks/web-serving "GitHub Repo" - [dhrepo]: https://hub.docker.com/r/cloudsuite/web-serving/ "DockerHub Page" - [dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/web-serving.svg "Go to DockerHub Page" - [dhstars]: https://img.shields.io/docker/stars/cloudsuite/web-serving.svg "Go to DockerHub Page" + [repo]: https://github.com/parsa-epfl/cloudsuite/tree/CSv3/benchmarks/web-serving "GitHub Repo" + [dhrepo]: https://hub.docker.com/r/cloudsuite3/web-serving/ "DockerHub Page" + [dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/web-serving.svg "Go to DockerHub Page" + [dhstars]: https://img.shields.io/docker/stars/cloudsuite3/web-serving.svg "Go to DockerHub Page" diff --git a/docs/commons/hadoop.md b/docs/commons/hadoop.md index b553ef7b2..b75d0512c 100644 --- a/docs/commons/hadoop.md +++ b/docs/commons/hadoop.md @@ -1,10 +1,10 @@ ## Hadoop -Currently supported version is 2.7.3. +Currently supported version is 2.9.2 for Cloudsuite 3.0. To obtain the image: ``` -$ docker pull cloudsuite/hadoop +$ docker pull cloudsuite3/hadoop ``` ### Running Hadoop @@ -16,13 +16,13 @@ $ docker network create hadoop-net Start Hadoop master with: ``` -$ docker run -d --net hadoop-net --name master --hostname master cloudsuite/hadoop master +$ docker run -d --net hadoop-net --name master --hostname master cloudsuite3/hadoop master ``` Start any number of Hadoop slaves with: ``` -$ docker run -d --net hadoop-net --name slave01 --hostname slave01 cloudsuite/hadoop slave -$ docker run -d --net hadoop-net --name slave02 --hostname slave02 cloudsuite/hadoop slave +$ docker run -d --net hadoop-net --name slave01 --hostname slave01 cloudsuite3/hadoop slave +$ docker run -d --net hadoop-net --name slave02 --hostname slave02 cloudsuite3/hadoop slave ... ``` diff --git a/docs/commons/spark.md b/docs/commons/spark.md index 672a541bb..4db99cef3 100644 --- a/docs/commons/spark.md +++ b/docs/commons/spark.md @@ -4,11 +4,11 @@ [![Stars on DockerHub][dhstars]][dhrepo] This repository contains a Docker image of Apache Spark. Currently we support -Spark versions 1.5.1 and 2.1.0. The lastest tag corresponds to version 2.1.0. +Spark versions 2.3.1 for CloudSuite 3.0. To obtain the image: - $ docker pull cloudsuite/spark + $ docker pull cloudsuite3/spark ## Running Spark @@ -16,21 +16,21 @@ To obtain the image: To try out Spark running in a single container, start the container with: - $ docker run -it --rm cloudsuite/spark bash + $ docker run -it --rm cloudsuite3/spark bash -Spark installation is located under /opt/spark-1.5.1. Try running an example that +Spark installation is located under /opt/spark-2.3.1. Try running an example that calculates Pi with 100 tasks: - $ /opt/spark-1.5.1/bin/spark-submit --class org.apache.spark.examples.SparkPi \ - /opt/spark-1.5.1/lib/spark-examples-1.5.1-hadoop2.6.0.jar 100 + $ /opt/spark-2.3.1/bin/spark-submit --class org.apache.spark.examples.SparkPi \ + /opt/spark-2.3.1/lib/spark-examples-2.3.1-hadoop2.6.0.jar 100 You can also run Spark programs using spark-submit without entering the interactive shell by supplying "submit" as the command to run the image. Arguments after "submit" are passed to spark-submit. For example, to run the same example as above type: - $ docker run --rm cloudsuite/spark submit --class org.apache.spark.examples.SparkPi \ - /opt/spark-1.5.1/lib/spark-examples-1.5.1-hadoop2.6.0.jar 100 + $ docker run --rm cloudsuite3/spark submit --class org.apache.spark.examples.SparkPi \ + /opt/spark-2.3.1/lib/spark-examples-2.3.1-hadoop2.6.0.jar 100 Notice that the path to the jar is a path inside the container. You can pass jars in the host filesystem as arguments if you map the directory where they @@ -38,10 +38,10 @@ reside as a Docker volume. Finally, you can also start an interactive Spark shell with: - $ docker run -it --rm cloudsuite/spark shell + $ docker run -it --rm cloudsuite3/spark shell Again, this is just a shortcut for starting a container and running -/opt/spark-1.5.1/bin/spark-shell. Try running a simple parallelized count: +/opt/spark-2.3.1/bin/spark-shell. Try running a simple parallelized count: $ sc.parallelize(1 to 1000).count() @@ -62,12 +62,12 @@ service discovery on the default bridge network. Start a Spark master: - $ docker run -dP --net spark-net --hostname spark-master --name spark-master cloudsuite/spark master + $ docker run -dP --net spark-net --hostname spark-master --name spark-master cloudsuite3/spark master Start a number of Spark workers: - $ docker run -dP --net spark-net --name spark-worker-01 cloudsuite/spark worker spark://spark-master:7077 - $ docker run -dP --net spark-net --name spark-worker-02 cloudsuite/spark worker spark://spark-master:7077 + $ docker run -dP --net spark-net --name spark-worker-01 cloudsuite3/spark worker spark://spark-master:7077 + $ docker run -dP --net spark-net --name spark-worker-02 cloudsuite3/spark worker spark://spark-master:7077 $ ... We can monitor our jobs using Spark's web UI. Point your browser to MASTER_IP:8080, where: @@ -80,20 +80,20 @@ spark-master argument to Spark. Start Spark container with bash and run spark-submit inside it to estimate Pi: - $ docker run -it --rm --net spark-net cloudsuite/spark bash - $ /opt/spark-1.5.1/bin/spark-submit --class org.apache.spark.examples.SparkPi \ + $ docker run -it --rm --net spark-net cloudsuite3/spark bash + $ /opt/spark-2.3.1/bin/spark-submit --class org.apache.spark.examples.SparkPi \ --master spark://spark-master:7077 \ - /opt/spark-1.5.1/lib/spark-examples-1.5.1-hadoop2.6.0.jar 100 + /opt/spark-2.3.1/lib/spark-examples-2.3.1-hadoop2.6.0.jar 100 Start Spark container with "submit" command to estimate Pi: - $ docker run --rm --net spark-net cloudsuite/spark submit --class org.apache.spark.examples.SparkPi \ + $ docker run --rm --net spark-net cloudsuite3/spark submit --class org.apache.spark.examples.SparkPi \ --master spark://spark-master:7077 \ - /opt/spark-1.5.1/lib/spark-examples-1.5.1-hadoop2.6.0.jar 100 + /opt/spark-2.3.1/lib/spark-examples-2.3.1-hadoop2.6.0.jar 100 Start Spark container with "shell" command and run a parallelized count: - $ docker run -it --rm --net spark-net cloudsuite/spark shell --master spark://spark-master:7077 + $ docker run -it --rm --net spark-net cloudsuite3/spark shell --master spark://spark-master:7077 $ sc.parallelize(1 to 1000).count() For a multi-node setup, where multiple Docker containers are running on @@ -101,7 +101,7 @@ multiple physical nodes, all commands remain the same if using Docker Swarm as the cluster manager. The only difference is that the new network needs to be an overlay network instead of a bridge network. -[dhrepo]: https://hub.docker.com/r/cloudsuite/spark/ "DockerHub Page" -[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/spark.svg "Go to DockerHub Page" -[dhstars]: https://img.shields.io/docker/stars/cloudsuite/spark.svg "Go to DockerHub Page" +[dhrepo]: https://hub.docker.com/r/cloudsuite3/spark/ "DockerHub Page" +[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/spark.svg "Go to DockerHub Page" +[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/spark.svg "Go to DockerHub Page" diff --git a/docs/datasets/movielens-dataset.md b/docs/datasets/movielens-dataset.md index c1130cd7c..9ecc3e3c2 100644 --- a/docs/datasets/movielens-dataset.md +++ b/docs/datasets/movielens-dataset.md @@ -10,7 +10,7 @@ Size is around 1MB. The large dataset (ml-latest) has 21,000,000 ratings applied to 30,000 movies by 230,000 users. Size is 144MB. This image is intended to be used with the -[cloudsuite/in-memory-analytics][ima-dhrepo] image as the dataset to run the +[cloudsuite3/in-memory-analytics][ima-dhrepo] image as the dataset to run the benchmark on. The datasets and the personal ratings file myratings.csv are located in /data, @@ -20,10 +20,10 @@ ratings. To obtain the image: - $ docker pull cloudsuite/movielens-dataset + $ docker pull cloudsuite3/movielens-dataset -[dhrepo]: https://hub.docker.com/r/cloudsuite/movielens-dataset/ "DockerHub Page" -[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite/movielens-dataset.svg "Go to DockerHub Page" -[dhstars]: https://img.shields.io/docker/stars/cloudsuite/movielens-dataset.svg "Go to DockerHub Page" -[ima-dhrepo]: https://hub.docker.com/r/cloudsuite/in-memory-analytics/ +[dhrepo]: https://hub.docker.com/r/cloudsuite3/movielens-dataset/ "DockerHub Page" +[dhpulls]: https://img.shields.io/docker/pulls/cloudsuite3/movielens-dataset.svg "Go to DockerHub Page" +[dhstars]: https://img.shields.io/docker/stars/cloudsuite3/movielens-dataset.svg "Go to DockerHub Page" +[ima-dhrepo]: https://hub.docker.com/r/cloudsuite3/in-memory-analytics/