This directory archives Pytheas implementation prior to NSDI submission.
The newest version of Pytheas code is available at https://github.com/nsdi2017-ddn/pytheas
System: Ubuntu 15.10
Java compiler tools (Maven) installation:
$ sudo apt-get update
$ sudo apt-get install -y default-jdk maven
Contains programs need to be deployed on each front-end server host.
Auto-deployment script (for Apache httpd and php programs):
../front_server $ sudo ./frontserver_deploy.sh
compile using maven:
../GroupManager $ mvn package
run:
../GroupManager $ java -cp target/GroupManager-1.0-SNAPSHOT.jar frontend.GroupManager <cluster_ID> <kafka_server> <config_file>
<cluster_ID> is the ID of current cluster
<kafka_server> is the list of IP of kafka servers, separated by comma
<config_file> contains labels of update info and reduced labels
Deploy on one or more hosts in each cluster to manage the communications between each functional module.
Kafka deployment:
../kafka $ sudo ./kafka_deploy.sh <host_list> <host_number>
<host_list> is all IP addresses of kafka servers, separated by comma
<host_number> is the sequence number of current host in host_list
run:
$ cd /usr/share/kafka
$ sudo bin/zookeeper-server-start.sh config/zookeeper.properties &
$ sudo bin/kafka-server-start.sh config/server.properties
Note: If run kafka on more than one host. Execute third command only if second command has been executed on each host.
Contains Decision-making module and communication module, each uses spark and can be run on one or more hosts.
Spark deployment:
../spark $ sudo ./spark_deploy.sh
make decision for each group.
compile using maven and submit it to spark.
communicate with backend cluster and other frontend clusters.
like DecisionMaker
, compile using maven and submit it to spark.
reference :
some small scripts and programs to test the scalability of frontend cluster.
Test the response time of requests.
A simple python program to perform HTTP POST request 1000 times and plot the CDF of response time:
$ ./post_time.py
A shell using Apache Benchmark to test the response time of frontend server.
$ ./responseTime.ssh
A standalone benchmark to perform the HTTP POST request. Test time and request per second(RPS) can be controlled.
$ ./benchmark.py
A distributed benchmark to perform the HTTP POST request.
Run slave program on all the hosts to perform the benchmark. Then run master program on one host to start test. When test finished, master program will generator three figures(Response Time, Successful RPS, CDF Response Time)
run slave:
$ ./dbenchmark_slave <url>
<url>: Desti-URL slave program will send requests to
run master:
$ ./dbenchmark_master <Time> <RPS>
<Time>: the time this test will last
<RPS>: request per second. Actually this parameter is only positive correlated with real RPS. The real RPS will show in the result figure.
Note: the host runs master program need to install matplotlib
:
sudo apt-get install -y python-matplotlib
This is special designed for test of throughput of Kafka and Spark Streaming. Need cooperation of special msg format.
compile:
../KafkaBenchmark $ mvn package
run:
send msg to kafka :
java -cp target/KafkaBenchmark-1.0-SNAPSHOT.jar mybenchmark.MsgReader <kafka_sender> <mps>
<kafka_server>: hostname of kafka server
mps: messages per second
Note: By default all msgs are sent to topic internal_groups
receive msg from kafka :
java -cp target/KafkaBenchmark-1.0-SNAPSHOT.jar mybenchmark.MsgReader <kafka_server> <topic>
<kafka_server>: hostname of kafka server
<topic>: Kafka topic this Reader will comsume
some scripts to test the system or algorithm performance using traces.
trace_sort.sh
: sort the trace by timestamp
main scripts for algorithm comparison
auto_plot.sh
: plot the algorithm comparison results
combine.py
: process raw data
cost.conf
: Gnuplot script for the plot
pull*.sh
: pull the test result from cluster to localhost
trace_parser.py
: parse the trace and simulate the player
main scripts for fault tolerance experiment
ft.conf
: Gnuplot script
sort
: process raw data
trace_parser_multi.py
: parse the trace and simulate multiple players
For real-world trace benchmark, deploy all mudules of a frontend cluster on one host. This is more efficient for multiple algorithms comparison.
autoscp.sh
: upload files used.
onehost_deploy
: deployment script
start_tmux
: run necessary programs in tmux
abandoned module codes, including load balancer (HAProxy) and proxy server.