Skip to content

Commit

Permalink
Add HHVM MediaWiki to PKB (GoogleCloudPlatform#106)
Browse files Browse the repository at this point in the history
* Add PHP/Mediawiki to pkb

* Add intel_hhvm_mediawiki benchmark with system HHVM

* Update provisioning playbook name

* Add intel_runtime flag for WordPress and MediaWiki

* Fix trailing whitespaces

* Update README for intel_wordpress and intel_mediawiki
  • Loading branch information
Papa, Florin authored and harp-intel committed Mar 14, 2019
1 parent 8d0bcf9 commit b812050
Show file tree
Hide file tree
Showing 6 changed files with 400 additions and 134 deletions.
118 changes: 118 additions & 0 deletions perfkitbenchmarker/data/intel_mediawiki_benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@

## Intel MediaWiki benchmark guidelines

#### Foreword

oss-performance/MediaWiki workload as we call it is a collection of scripts and wrappers that allows the user to prepare the workload, run the workload (default or customized using a config file), and collect reports using the opensource benchmark developed by Facebook.
To clarify, the components are as following:
- [oss-performance](https://github.com/hhvm/oss-performance): this is the public workload developed by Facebook. It's actually a harness that is able to run a series of PHP-based workloads (their terminology is 'targets'): WordPress, MediaWiki, Drupal, and others. You can use this to assess the performance of a PHP engine (either PHP Zend or Facebook's HHVM);
- [hhvm-perf](https://github.intel.com/DSLO/hhvm-perf): this is DSLO/HHVM team internal harness that has been developed by Octavian M. and It's basically a wrapper over oss-performance. It is able to run the targets multiple times, compute the average/standard deviation of various metrics like transactions/second, collect emon, and perf data, collect logs and workload artifacts etc. It's mostly adding a friendlier interface to HHVM team existing internal tooling.
The expectations here are to integrate these components into PKB and be able to prepare the a target system, run the workload and collect performance, telemetry and artifacts in a standardized and replicable way.

## How to run oss-performance/MediaWiki (MW) with PKB
After PKB is [installed and configured](https://github.intel.com/cspbench/PerfKitBenchmarker#installing-perfkit-benchmarker-and-prerequisites) the user will be able to run MW as following:

```
python pkb.py --cloud=AWS --benchmarks=intel_mediawiki --machine_type=m5.24xlarge
```
additionally some usefull flags had been implemented to be used for PKB commandline launch, which are specific to MW workload: \
`--intel_mediawiki_execution_count=<default_value_is_1>`
- if specified, this flag is used to tell the harness how many times to run oss-performance for current a PKB run if is not specified it will use the default value which is 1.

`--intel_mediawiki_server_threads=<default_value_is_100>`
- if specified, this flag will overwrite the default server thread value which is 100 for the current PKB run.

`--intel_mediawiki_engine=<default_value_is_php>`
- this flag allows switching the scripting language used to power the MediaWiki server. By default, PHP is used, but HHVM can be used as well (if --intel_mediawiki_engine=hhvm is specified).


## Run MW with PKB on bare-metal
PKB allows passing a configuration file (.yml), in order to be able to run the workload on bare metal an VM needs to be specified as a target. The commad will be as following:

```
python pkb.py --benchmark_config_file=mw_config.yml --benchmarks=intel_mediawiki --machine_type=m5.24xlarge
```

Example for mw_config.yml:

```
static_vms:
- &vm0
ip_address: <ip_address>
user_name: pkb
ssh_private_key: ~/.ssh/id_rsa
internal_ip: <ip_address>
intel_mediawiki:
vm_groups:
target:
static_vms:
- *vm0
```
**Note**
Make sure you are using a different user from root.
It is assumed that pkb user already exists on the target machine, if not here are some guidelines to create the user:

**Onboarding of a target system**
SSH to the target system and create a passwordless user:

`sudo useradd -m <username>`

Make it sudoer:

`sudo usermod -aG sudo <username>` \
Note: wheel group for Centos

Configure the user to not ask for the password, ssh keys will be used for authentication:

`sudo visudo` \
In nano editor, look for following lines:

>Allow members of group sudo to execute any command \
>%sudo ALL=(ALL:ALL) ALL
right below add the following:

>\<username> ALL=(ALL:ALL) NOPASSWD:ALL
save and exit.

Copy your key to the target system:

Copy your workstation identity (~/.ssh/id_rsa.pub) to target system user authorized keys file (/home/<username>/.ssh/authorized_keys) Make sure ansible user owns the .ssh folder and it's content on the target system.

`cat ~/.ssh/id_rsa.pub | ssh <username>@<hostname> 'cat >> .ssh/authorized_keys'`
Now trying to ssh into <username>@<hostname> should not require a password.

## Integration notes

As a first step a list with generic operations and the order of execution is needed from the workload owner, just listed as if executed by hand, considering that this setup is done on a freshly installed platform.

PKB has predefined "steps" to do the job, a very simplified schema looks like this [from intel_mediawiki_benchmark.py](https://github.intel.com/cspbench/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_benchmarks/intel_mediawiki_benchmark.py):
```
...
#Workload metadata definition section
flags.DEFINE_integer('intel_mediawiki_execution_count', 1,
'The number of times to run against chosen target.')
flags.DEFINE_integer('intel_mediawiki_server_threads', 100,
'The number of threads to execute.')
flags.DEFINE_string('intel_mediawiki_runtime', 'php',
'The runtime used by the MediaWiki server. Can be ',
'either php (default) or hhvm')
...
#Define workload name and execution orchestration
...
BENCHMARK_NAME = 'intel_mediawiki'
BENCHMARK_CONFIG = """
intel_mediawiki:
description: >
Run HHVM's oss-performance harness to drive Siege against
Nginx, PHP or HHVM, MediaWiki using MariaDB on the back end.
vm_groups:
target:
os_type: ubuntu1604
vm_spec: *default_dual_core
"""
16 changes: 16 additions & 0 deletions perfkitbenchmarker/data/intel_mediawiki_benchmark/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
build:
enabled: false

bolt:
enabled: false

paths:
hhvm_oss_perf: /usr/bin/hhvm

run:
targets:
- mediawiki
count: 1
cpu_util: false
oss_additional_params:

52 changes: 10 additions & 42 deletions perfkitbenchmarker/data/intel_wordpress_benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ oss-performance/WordPress workload as we call it is a collection of scripts and
To clarify, the components are as following:
- [oss-performance](https://github.com/hhvm/oss-performance): this is the public workload developed by Facebook. It's actually a harness that is able to run a series of PHP-based workloads (their terminology is 'targets'): WordPress, MediaWiki, Drupal, and others. You can use this to assess the performance of a PHP engine (either PHP Zend or Facebook's HHVM);
- [hhvm-perf](https://github.intel.com/DSLO/hhvm-perf): this is DSLO/HHVM team internal harness that has been developed by Octavian M. and It's basically a wrapper over oss-performance. It is able to run the targets multiple times, compute the average/standard deviation of various metrics like transactions/second, collect emon, and perf data, collect logs and workload artifacts etc. It's mostly adding a friendlier interface to HHVM team existing internal tooling.
The expectations here are to integrate these components into PKB and be able to prepare the a target system, run the workload and collect performance, telemetry and artifacts in a standardized and replicable way.
The expectations here are to integrate these components into PKB and be able to prepare the a target system, run the workload and collect performance, telemetry and artifacts in a standardized and replicable way.

## How to run oss-performance/WordPress (WP) with PKB
After PKB is [installed and configured](https://github.intel.com/cspbench/PerfKitBenchmarker#installing-perfkit-benchmarker-and-prerequisites) the user will be able to run WP as following:
Expand All @@ -22,12 +22,12 @@ additionally some usefull flags had been implemented to be used for PKB commandl
`--intel_wordpress_server_threads=<default_value_is_100>`
- if specified, this flag will overwrite the default server thread value which is 100 for the current PKB run.

`--intel_wordpress_internal_counters="-vEval.ProfileHWEnable=false"`
- if this flag is passed for the current PKB run, the internal performance counters of oss-performance will be disabled, permitting this way emon collection for the entire run
`--intel_wordpress_engine=<default_value_is_php>`
- this flag allows switching the scripting language used to power the WordPress server. By default, PHP is used, but HHVM can be used as well (if --intel_wordpress_engine=hhvm is specified).


## Run WP with PKB on bare-metal
PKB allows passing a configuration file (.yml), in order to be able to run the workload on bare metal an VM needs to be specified as a target. The commad will be as following:
PKB allows passing a configuration file (.yml), in order to be able to run the workload on bare metal an VM needs to be specified as a target. The command will be as following:

```
python pkb.py --benchmark_config_file=wp_config.yml --benchmarks=intel_wordpress --machine_type=m5.24xlarge
Expand Down Expand Up @@ -57,7 +57,7 @@ It is assumed that pkb user already exists on the target machine, if not here ar
**Onboarding of a target system**
SSH to the target system and create a passwordless user:

`sudo useradd -m <username>`
`sudo useradd -m <username>`

Make it sudoer:

Expand Down Expand Up @@ -97,8 +97,9 @@ flags.DEFINE_integer('intel_wordpress_execution_count', 1,
'The number of times to run against chosen target.')
flags.DEFINE_integer('intel_wordpress_server_threads', 100,
'The number of threads to execute.')
flags.DEFINE_string('intel_wordpress_internal_counters', '',
'Let oss know to stop performance counters')
flags.DEFINE_string('intel_wordpress_runtime', 'php',
'The runtime used by the WordPress server. Can be ',
'either php (default) or hhvm')
...
#Define workload name and execution orchestration
Expand All @@ -108,43 +109,10 @@ BENCHMARK_NAME = 'intel_wordpress'
BENCHMARK_CONFIG = """
intel_wordpress:
description: >
Run HHVM's oss-performance harness to drive Siege against
Nginx, PHP, WordPress using MariaDB on the back end.
Run HHVM's oss-performance harness to drive Siege against
Nginx, PHP or HHVM, WordPress using MariaDB on the back end.
vm_groups:
target:
os_type: ubuntu1604
vm_spec: *default_dual_core
"""
...
# pre-reqs from here: hhvm_provisioning/hhvm/roles/commons/tasks/main.yml
PREREQ_PKGS = ["software-properties-common",
...
# MariaDB and friends: hhvm_provisioning/hhvm/roles/mariadb/tasks/main.yml
MARIADB_PHP_PKGS = ["php",
...
#External files required to run workload
DATA_FILES = ['intel_wordpress_benchmark/my.cnf',
'intel_wordpress_benchmark/hhvm-perf.tar.gz']
...
```
After setting the context in which the run will be performed the actual workflow is determined as following [from intel_wordpress_benchmark.py](https://github.intel.com/cspbench/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_benchmarks/intel_wordpress_benchmark.py):
```
def GetConfig(user_config)
...
def CheckPrerequisites(config)
...
def Prepare(benchmark_spec)
...
def Run(benchmark_spec)
...
def Cleanup(benchmark_spec)
```
For some of the workload components, additional provisioning scripts need to be written as part of pkb linux_packages:
[linux_packages/composer.py](https://github.intel.com/cspbench/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_packages/composer.py) - install composer \
[linux_packages/hhvm.py](https://github.intel.com/cspbench/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_packages/hhvm.py) - installs hhvm \
[linux_packages/hhvm_oss_performance.py](https://github.intel.com/cspbench/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_packages/hhvm_oss_performance.py) - installs oss-performance \
[linux_packages/intel_hhvm_perf.py](https://github.intel.com/cspbench/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_packages/intel_hhvm_perf.py) - sets the oos-performance harness
126 changes: 126 additions & 0 deletions perfkitbenchmarker/intel_php_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Copyright 2015 PerfKitBenchmarker Authors. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Utilities for running PHP/HHVM workloads"""

import json
import logging
import os
import re
import StringIO
import yaml

from perfkitbenchmarker import data
from perfkitbenchmarker import vm_util
from perfkitbenchmarker import sample
from perfkitbenchmarker import os_types
from perfkitbenchmarker.linux_packages import INSTALL_DIR

PREREQ_PKGS = ["software-properties-common",
"apt-transport-https",
"iputils-ping",
"python",
"python3",
"python3-pip"
]

def Prepare(benchmark_spec,
workload_name,
workload_engine,
count,
server_workers):
"""Prepare the virtual machines to run."""
vm = benchmark_spec.vm_groups['target'][0]

vm.Uninstall('intel_hhvm_provisioning')

if (vm.OS_TYPE == os_types.RHEL):
PREREQ_PKGS.append("openssh-clients")
vm.InstallPackages(' '.join(PREREQ_PKGS))
vm.Install('ansible')
vm.Install('intel_hhvm_provisioning')
# run the provisioning
vm.RemoteHostCommand('cd ' + INSTALL_DIR + ' && '
'ansible-playbook -i hhvm_provisioning/hhvm/hosts '
'hhvm_provisioning/hhvm/' + workload_engine +'_pkb.yml')
if workload_engine == "php":
out, _ = vm.RemoteHostCommand('ls /usr/sbin/php-fpm*')
elif workload_engine == "hhvm":
out, _ = vm.RemoteHostCommand('ls /usr/bin/hhvm*')
# get actual engine binary name
engine_path = out.splitlines()[0]

# configure hhvm-perf/config.yml
logging.info("configuring the hhvm-perf workload harness")

conf = data.ResourcePath('intel_' + workload_name + '_benchmark/config.yml')
with open(conf) as stream:
config = yaml.load(stream)
config['paths']['engine'] = engine_path
config['run']['count'] = count
config['run']['server_workers'] = str(server_workers)

new_conf = vm_util.PrependTempDir('config.yml')
with open(new_conf, 'w') as stream:
yaml.dump(config, stream)

vm.RemoteCopy(new_conf, INSTALL_DIR + '/git/hhvm-perf')

def Run(benchmark_spec,
workload_name,
workload_engine,
count,
server_workers):
"""Run Siege and gather the results."""
samples = []

vm = benchmark_spec.vm_groups['target'][0]
logging.info("running the workload")
stdout, _ = vm.RobustRemoteCommand('cd ' + INSTALL_DIR + '/git/hhvm-perf '
' && ./run.py')

logging.info("copying workload output to local run output dir")
# workload output location is specified on stdout
workload_output_dir = None
stdout_io = StringIO.StringIO(stdout)
for line in stdout_io:
sline = line.strip()
if sline.startswith('Done. Latest results in:'):
match = re.search(r'Done. Latest results in: (.*)$', line)
if match == None:
logging.error("Parsing error -- regex doesn't match for string: %s", line)
else:
workload_output_dir = match.group(1)
# copy workoad output folder from vm to local temp run dir
tps = 0
metadata = {}
if workload_output_dir:
vm.RemoteCopy(vm_util.GetTempDir(), workload_output_dir, False)
results_file = os.path.join(os.path.basename(workload_output_dir),
'results', workload_name, 'run',
'Performance-' + workload_name + '.json')
with open(vm_util.PrependTempDir(results_file)) as f:
json_f = json.loads(f.read())
tps = json_f['oss-performance results']['Transaction Rate (in trans/sec)']['Average']
software_stack = os.path.join(os.path.basename(workload_output_dir),
'results', workload_name,
'Software_Stack_' + workload_name + '.json')
with open(vm_util.PrependTempDir(software_stack)) as f:
metadata = json.loads(f.read())
metadata['server_threads'] = server_workers
metadata['execution_count'] = count


samples.append(sample.Sample("transaction rate", tps, "transactions/second", metadata))
return samples
Loading

0 comments on commit b812050

Please sign in to comment.