This directory archives Pytheas implementation prior to NSDI submission.

The newest version of Pytheas code is available at https://github.com/nsdi2017-ddn/pytheas

Table of Contents

Environment
Front Server
1. Web Server
2. Group Manager
kafka
spark
1. Decision Maker
2. Communicator
Benchmark
Trace
Abandoned

##Environment

System: Ubuntu 15.10

Java compiler tools (Maven) installation:

$ sudo apt-get update

$ sudo apt-get install -y default-jdk maven

##Front Server

Contains programs need to be deployed on each front-end server host.

###Web Server

Auto-deployment script (for Apache httpd and php programs):

../front_server $ sudo ./frontserver_deploy.sh

###Group Manager

compile using maven:

../GroupManager $ mvn package

run:

../GroupManager $ java -cp target/GroupManager-1.0-SNAPSHOT.jar frontend.GroupManager <cluster_ID> <kafka_server> <config_file>

<cluster_ID> is the ID of current cluster

<kafka_server> is the list of IP of kafka servers, separated by comma

<config_file> contains labels of update info and reduced labels

##Kafka

Deploy on one or more hosts in each cluster to manage the communications between each functional module.

Kafka deployment:

../kafka $ sudo ./kafka_deploy.sh <host_list> <host_number>

<host_list> is all IP addresses of kafka servers, separated by comma

<host_number> is the sequence number of current host in host_list

run:

$ cd /usr/share/kafka

$ sudo bin/zookeeper-server-start.sh config/zookeeper.properties &

$ sudo bin/kafka-server-start.sh config/server.properties

Note: If run kafka on more than one host. Execute third command only if second command has been executed on each host.

##Spark

Contains Decision-making module and communication module， each uses spark and can be run on one or more hosts.

Spark deployment:

../spark $ sudo ./spark_deploy.sh

###Decision Maker

make decision for each group.

compile using maven and submit it to spark.

###Communicator

communicate with backend cluster and other frontend clusters.

like DecisionMaker, compile using maven and submit it to spark.

reference :

Run Spark on Multi-hosts

Spark Submitting Applications

##Benchmark

some small scripts and programs to test the scalability of frontend cluster.

###Response Time

Test the response time of requests.

Python Script

A simple python program to perform HTTP POST request 1000 times and plot the CDF of response time:

$ ./post_time.py

Apache Benchmark

A shell using Apache Benchmark to test the response time of frontend server.

$ ./responseTime.ssh

###Python Benchmark

Standalone Benchmark

A standalone benchmark to perform the HTTP POST request. Test time and request per second(RPS) can be controlled.

$ ./benchmark.py

Distributed Benchmark

A distributed benchmark to perform the HTTP POST request.

Run slave program on all the hosts to perform the benchmark. Then run master program on one host to start test. When test finished, master program will generator three figures(Response Time, Successful RPS, CDF Response Time)

run slave:

$ ./dbenchmark_slave <url>

<url>: Desti-URL slave program will send requests to

run master:

$ ./dbenchmark_master <Time> <RPS>

<Time>: the time this test will last

<RPS>: request per second. Actually this parameter is only positive correlated with real RPS. The real RPS will show in the result figure.

Note: the host runs master program need to install matplotlib :

sudo apt-get install -y python-matplotlib

###Kafka Benchmark

This is special designed for test of throughput of Kafka and Spark Streaming. Need cooperation of special msg format.

compile:

../KafkaBenchmark $ mvn package

run:

send msg to kafka :

java -cp target/KafkaBenchmark-1.0-SNAPSHOT.jar mybenchmark.MsgReader <kafka_sender> <mps>

<kafka_server>: hostname of kafka server

mps: messages per second

Note: By default all msgs are sent to topic internal_groups

receive msg from kafka :

java -cp target/KafkaBenchmark-1.0-SNAPSHOT.jar mybenchmark.MsgReader <kafka_server> <topic>

<kafka_server>: hostname of kafka server

<topic>: Kafka topic this Reader will comsume

##Trace

some scripts to test the system or algorithm performance using traces.

trace_sort.sh : sort the trace by timestamp

###Algorithm Comparison

main scripts for algorithm comparison

auto_plot.sh : plot the algorithm comparison results

combine.py : process raw data

cost.conf : Gnuplot script for the plot

pull*.sh : pull the test result from cluster to localhost

trace_parser.py : parse the trace and simulate the player

###Fault Tolerance

main scripts for fault tolerance experiment

ft.conf : Gnuplot script

sort : process raw data

trace_parser_multi.py : parse the trace and simulate multiple players

###One Host Experiment

For real-world trace benchmark, deploy all mudules of a frontend cluster on one host. This is more efficient for multiple algorithms comparison.

autoscp.sh : upload files used.

onehost_deploy : deployment script

start_tmux : run necessary programs in tmux

##Abandoned

abandoned module codes, including load balancer (HAProxy) and proxy server.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
benchmark		benchmark
front_server		front_server
kafka		kafka
spark		spark
trace		trace
README.md		README.md

PengfeiM/ddn

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Python Script

Apache Benchmark

Standalone Benchmark

Distributed Benchmark

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages