Add HHVM MediaWiki to PKB (GoogleCloudPlatform#106)

* Add PHP/Mediawiki to pkb * Add intel_hhvm_mediawiki benchmark with system HHVM * Update provisioning playbook name * Add intel_runtime flag for WordPress and MediaWiki * Fix trailing whitespaces * Update README for intel_wordpress and intel_mediawiki
keerockl · Mar 14, 2019 · b812050 · b812050
1 parent 8d0bcf9
commit b812050
Show file tree

Hide file tree

Showing 6 changed files with 400 additions and 134 deletions.
diff --git a/perfkitbenchmarker/data/intel_mediawiki_benchmark/README.md b/perfkitbenchmarker/data/intel_mediawiki_benchmark/README.md
@@ -0,0 +1,118 @@
+
+## Intel MediaWiki benchmark guidelines
+
+#### Foreword
+
+oss-performance/MediaWiki workload as we call it is a collection of scripts and wrappers that allows the user to prepare the workload, run the workload (default or customized using a config file), and collect reports using the opensource benchmark developed by Facebook.
+To clarify, the components are as following:
+- [oss-performance](https://github.com/hhvm/oss-performance): this is the public workload developed by Facebook. It's actually a harness that is able to run a series of PHP-based workloads (their terminology is 'targets'): WordPress, MediaWiki, Drupal, and others. You can use this to assess the performance of a PHP engine (either PHP Zend or Facebook's HHVM);
+- [hhvm-perf](https://github.intel.com/DSLO/hhvm-perf): this is DSLO/HHVM team internal harness that has been developed by Octavian M. and It's basically a wrapper over oss-performance. It is able to run the targets multiple times, compute the average/standard deviation of various metrics like transactions/second, collect emon, and perf data, collect logs and workload artifacts etc. It's mostly adding a friendlier interface to HHVM team existing internal tooling.
+The expectations here are to integrate these components into PKB and be able to prepare the a target system, run the workload and collect performance, telemetry and artifacts in a standardized and replicable way.
+
+## How to run oss-performance/MediaWiki (MW) with PKB
+After PKB is [installed and configured](https://github.intel.com/cspbench/PerfKitBenchmarker#installing-perfkit-benchmarker-and-prerequisites) the user will be able to run MW as following:
+
+```
+python pkb.py --cloud=AWS --benchmarks=intel_mediawiki --machine_type=m5.24xlarge
+```
+additionally some usefull flags had been implemented to be used for PKB commandline launch, which are specific to MW workload: \
+`--intel_mediawiki_execution_count=<default_value_is_1>`
+- if specified, this flag is used to tell the harness how many times to run oss-performance for current a PKB run if is not specified it will use the default value which is 1.
+
+`--intel_mediawiki_server_threads=<default_value_is_100>`
+- if specified, this flag will overwrite the default server thread value which is 100 for the current PKB run.
+
+`--intel_mediawiki_engine=<default_value_is_php>`
+- this flag allows switching the scripting language used to power the MediaWiki server. By default, PHP is used, but HHVM can be used as well (if --intel_mediawiki_engine=hhvm is specified).
+
+
+## Run MW with PKB on bare-metal
+PKB allows passing a configuration file (.yml), in order to be able to run the workload on bare metal an VM needs to be specified as a target. The commad will be as following:
+
+```
+python pkb.py --benchmark_config_file=mw_config.yml --benchmarks=intel_mediawiki --machine_type=m5.24xlarge
+```
+
+Example for mw_config.yml:
+
+```
+static_vms:
+  - &vm0
+    ip_address: <ip_address>
+    user_name: pkb
+    ssh_private_key: ~/.ssh/id_rsa
+    internal_ip: <ip_address>
+
+intel_mediawiki:
+  vm_groups:
+    target:
+      static_vms:
+        - *vm0
+
+```
+**Note**
+Make sure you are using a different user from root.
+It is assumed that pkb user already exists on the target machine, if not here are some guidelines to create the user:
+
+**Onboarding of a target system**
+SSH to the target system and create a passwordless user:
+
+`sudo useradd -m <username>`
+
+Make it sudoer:
+
+`sudo usermod -aG sudo <username>` \
+Note: wheel group for Centos
+
+Configure the user to not ask for the password, ssh keys will be used for authentication:
+
+`sudo visudo` \
+In nano editor, look for following lines:
+
+>Allow members of group sudo to execute any command \
+>%sudo ALL=(ALL:ALL) ALL
+
+right below add the following:
+
+>\<username> ALL=(ALL:ALL) NOPASSWD:ALL
+
+save and exit.
+
+Copy your key to the target system:
+
+Copy your workstation identity (~/.ssh/id_rsa.pub) to target system user authorized keys file (/home/<username>/.ssh/authorized_keys) Make sure ansible user owns the .ssh folder and it's content on the target system.
+
+`cat ~/.ssh/id_rsa.pub | ssh <username>@<hostname> 'cat >> .ssh/authorized_keys'`
+Now trying to ssh into <username>@<hostname> should not require a password.
+
+## Integration notes
+
+As a first step a list with generic operations and the order of execution is needed from the workload owner, just listed as if executed by hand, considering that this setup is done on a freshly installed platform.
+
+PKB has predefined "steps" to do the job, a very simplified schema looks like this [from intel_mediawiki_benchmark.py](https://github.intel.com/cspbench/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_benchmarks/intel_mediawiki_benchmark.py):
+```
+...
+#Workload metadata definition section
+flags.DEFINE_integer('intel_mediawiki_execution_count', 1,
+                    'The number of times to run against chosen target.')
+flags.DEFINE_integer('intel_mediawiki_server_threads', 100,
+                    'The number of threads to execute.')
+flags.DEFINE_string('intel_mediawiki_runtime', 'php',
+                    'The runtime used by the MediaWiki server. Can be ',
+                    'either php (default) or hhvm')
+...
+
+#Define workload name and execution orchestration
+...
+
+BENCHMARK_NAME = 'intel_mediawiki'
+BENCHMARK_CONFIG = """
+intel_mediawiki:
+  description: >
+      Run HHVM's oss-performance harness to drive Siege against
+      Nginx, PHP or HHVM, MediaWiki using MariaDB on the back end.
+  vm_groups:
+    target:
+      os_type: ubuntu1604
+      vm_spec: *default_dual_core
+"""
diff --git a/perfkitbenchmarker/data/intel_mediawiki_benchmark/config.yml b/perfkitbenchmarker/data/intel_mediawiki_benchmark/config.yml
@@ -0,0 +1,16 @@
+build:
+  enabled: false
+
+bolt:
+    enabled: false
+
+paths:
+  hhvm_oss_perf: /usr/bin/hhvm
+
+run:
+  targets:
+  - mediawiki
+  count: 1
+  cpu_util: false
+  oss_additional_params:
+
diff --git a/perfkitbenchmarker/data/intel_wordpress_benchmark/README.md b/perfkitbenchmarker/data/intel_wordpress_benchmark/README.md
@@ -7,7 +7,7 @@ oss-performance/WordPress workload as we call it is a collection of scripts and
 To clarify, the components are as following:
 - [oss-performance](https://github.com/hhvm/oss-performance): this is the public workload developed by Facebook. It's actually a harness that is able to run a series of PHP-based workloads (their terminology is 'targets'): WordPress, MediaWiki, Drupal, and others. You can use this to assess the performance of a PHP engine (either PHP Zend or Facebook's HHVM);
 - [hhvm-perf](https://github.intel.com/DSLO/hhvm-perf): this is DSLO/HHVM team internal harness that has been developed by Octavian M. and It's basically a wrapper over oss-performance. It is able to run the targets multiple times, compute the average/standard deviation of various metrics like transactions/second, collect emon, and perf data, collect logs and workload artifacts etc. It's mostly adding a friendlier interface to HHVM team existing internal tooling.
-The expectations here are to integrate these components into PKB and be able to prepare the a target system, run the workload and collect performance, telemetry and artifacts in a standardized and replicable way. 
+The expectations here are to integrate these components into PKB and be able to prepare the a target system, run the workload and collect performance, telemetry and artifacts in a standardized and replicable way.
 
 ## How to run oss-performance/WordPress (WP) with PKB
 After PKB is [installed and configured](https://github.intel.com/cspbench/PerfKitBenchmarker#installing-perfkit-benchmarker-and-prerequisites) the user will be able to run WP as following:
@@ -22,12 +22,12 @@ additionally some usefull flags had been implemented to be used for PKB commandl
 `--intel_wordpress_server_threads=<default_value_is_100>`
 - if specified, this flag will overwrite the default server thread value which is 100 for the current PKB run.
 
-`--intel_wordpress_internal_counters="-vEval.ProfileHWEnable=false"`
-- if this flag is passed for the current PKB run, the internal performance counters of oss-performance will be disabled, permitting this way emon collection for the entire run
+`--intel_wordpress_engine=<default_value_is_php>`
+- this flag allows switching the scripting language used to power the WordPress server. By default, PHP is used, but HHVM can be used as well (if --intel_wordpress_engine=hhvm is specified).
 
 
 ## Run WP with PKB on bare-metal
-PKB allows passing a configuration file (.yml), in order to be able to run the workload on bare metal an VM needs to be specified as a target. The commad will be as following:
+PKB allows passing a configuration file (.yml), in order to be able to run the workload on bare metal an VM needs to be specified as a target. The command will be as following:
 
 ```
 python pkb.py --benchmark_config_file=wp_config.yml --benchmarks=intel_wordpress --machine_type=m5.24xlarge
@@ -57,7 +57,7 @@ It is assumed that pkb user already exists on the target machine, if not here ar
 **Onboarding of a target system**
 SSH to the target system and create a passwordless user:
 
-`sudo useradd -m <username>` 
+`sudo useradd -m <username>`
 
 Make it sudoer:
 
@@ -97,8 +97,9 @@ flags.DEFINE_integer('intel_wordpress_execution_count', 1,
                     'The number of times to run against chosen target.')
 flags.DEFINE_integer('intel_wordpress_server_threads', 100,
                     'The number of threads to execute.')
-flags.DEFINE_string('intel_wordpress_internal_counters', '',
-                    'Let oss know to stop performance counters')
+flags.DEFINE_string('intel_wordpress_runtime', 'php',
+                    'The runtime used by the WordPress server. Can be ',
+                    'either php (default) or hhvm')
 ...
 
 #Define workload name and execution orchestration
@@ -108,43 +109,10 @@ BENCHMARK_NAME = 'intel_wordpress'
 BENCHMARK_CONFIG = """
 intel_wordpress:
   description: >
-      Run HHVM's oss-performance harness to drive Siege against 
-      Nginx, PHP, WordPress using MariaDB on the back end.
+      Run HHVM's oss-performance harness to drive Siege against
+      Nginx, PHP or HHVM, WordPress using MariaDB on the back end.
   vm_groups:
     target:
       os_type: ubuntu1604
       vm_spec: *default_dual_core
 """
-...
-
-# pre-reqs from here: hhvm_provisioning/hhvm/roles/commons/tasks/main.yml
-PREREQ_PKGS = ["software-properties-common",
-...
-
-# MariaDB and friends: hhvm_provisioning/hhvm/roles/mariadb/tasks/main.yml
-MARIADB_PHP_PKGS = ["php",
-...
-
-#External files required to run workload
-DATA_FILES = ['intel_wordpress_benchmark/my.cnf',
-              'intel_wordpress_benchmark/hhvm-perf.tar.gz']
-...
-```
-After setting the context in which the run will be performed the actual workflow is determined as following [from intel_wordpress_benchmark.py](https://github.intel.com/cspbench/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_benchmarks/intel_wordpress_benchmark.py):
-```
-def GetConfig(user_config)
-...
-def CheckPrerequisites(config)
-...
-def Prepare(benchmark_spec)
-...
-def Run(benchmark_spec)
-...
-def Cleanup(benchmark_spec)
-
-```
-For some of the workload components, additional provisioning scripts need to be written as part of pkb linux_packages:
-[linux_packages/composer.py](https://github.intel.com/cspbench/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_packages/composer.py) - install composer \
-[linux_packages/hhvm.py](https://github.intel.com/cspbench/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_packages/hhvm.py) - installs hhvm \
-[linux_packages/hhvm_oss_performance.py](https://github.intel.com/cspbench/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_packages/hhvm_oss_performance.py) - installs oss-performance \
-[linux_packages/intel_hhvm_perf.py](https://github.intel.com/cspbench/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_packages/intel_hhvm_perf.py) - sets the oos-performance harness
diff --git a/perfkitbenchmarker/intel_php_utils.py b/perfkitbenchmarker/intel_php_utils.py
@@ -0,0 +1,126 @@
+# Copyright 2015 PerfKitBenchmarker Authors. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Utilities for running PHP/HHVM workloads"""
+
+import json
+import logging
+import os
+import re
+import StringIO
+import yaml
+
+from perfkitbenchmarker import data
+from perfkitbenchmarker import vm_util
+from perfkitbenchmarker import sample
+from perfkitbenchmarker import os_types
+from perfkitbenchmarker.linux_packages import INSTALL_DIR
+
+PREREQ_PKGS = ["software-properties-common",
+              "apt-transport-https",
+              "iputils-ping",
+              "python",
+              "python3",
+              "python3-pip"
+              ]
+
+def Prepare(benchmark_spec,
+            workload_name,
+            workload_engine,
+            count,
+            server_workers):
+  """Prepare the virtual machines to run."""
+  vm = benchmark_spec.vm_groups['target'][0]
+
+  vm.Uninstall('intel_hhvm_provisioning')
+
+  if (vm.OS_TYPE == os_types.RHEL):
+    PREREQ_PKGS.append("openssh-clients")
+  vm.InstallPackages(' '.join(PREREQ_PKGS))
+  vm.Install('ansible')
+  vm.Install('intel_hhvm_provisioning')
+  # run the provisioning
+  vm.RemoteHostCommand('cd ' + INSTALL_DIR + ' && '
+                       'ansible-playbook -i hhvm_provisioning/hhvm/hosts '
+                       'hhvm_provisioning/hhvm/' + workload_engine +'_pkb.yml')
+  if workload_engine == "php":
+    out, _ = vm.RemoteHostCommand('ls /usr/sbin/php-fpm*')
+  elif workload_engine == "hhvm":
+    out, _ = vm.RemoteHostCommand('ls /usr/bin/hhvm*')
+  # get actual engine binary name
+  engine_path = out.splitlines()[0]
+
+  # configure hhvm-perf/config.yml
+  logging.info("configuring the hhvm-perf workload harness")
+
+  conf = data.ResourcePath('intel_' + workload_name + '_benchmark/config.yml')
+  with open(conf) as stream:
+    config = yaml.load(stream)
+  config['paths']['engine'] = engine_path
+  config['run']['count'] = count
+  config['run']['server_workers'] = str(server_workers)
+
+  new_conf = vm_util.PrependTempDir('config.yml')
+  with open(new_conf, 'w') as stream:
+    yaml.dump(config, stream)
+
+  vm.RemoteCopy(new_conf, INSTALL_DIR + '/git/hhvm-perf')
+
+def Run(benchmark_spec,
+        workload_name, 
+        workload_engine,
+        count,
+        server_workers):
+  """Run Siege and gather the results."""
+  samples = []
+
+  vm = benchmark_spec.vm_groups['target'][0]
+  logging.info("running the workload")
+  stdout, _ = vm.RobustRemoteCommand('cd ' + INSTALL_DIR + '/git/hhvm-perf '
+                                     ' && ./run.py')
+
+  logging.info("copying workload output to local run output dir")
+  # workload output location is specified on stdout
+  workload_output_dir = None
+  stdout_io = StringIO.StringIO(stdout)
+  for line in stdout_io:
+    sline = line.strip()
+    if sline.startswith('Done. Latest results in:'):
+      match = re.search(r'Done. Latest results in: (.*)$', line)
+      if match == None:
+        logging.error("Parsing error -- regex doesn't match for string: %s", line)
+      else:
+        workload_output_dir = match.group(1)
+  # copy workoad output folder from vm to local temp run dir
+  tps = 0
+  metadata = {}
+  if workload_output_dir:
+    vm.RemoteCopy(vm_util.GetTempDir(), workload_output_dir, False)
+    results_file = os.path.join(os.path.basename(workload_output_dir),
+                                'results', workload_name, 'run',
+                                'Performance-' + workload_name + '.json')
+    with open(vm_util.PrependTempDir(results_file)) as f:
+      json_f = json.loads(f.read())
+    tps = json_f['oss-performance results']['Transaction Rate (in trans/sec)']['Average']
+    software_stack = os.path.join(os.path.basename(workload_output_dir),
+                                  'results', workload_name,
+                                  'Software_Stack_' + workload_name + '.json')
+    with open(vm_util.PrependTempDir(software_stack)) as f:
+      metadata = json.loads(f.read())
+    metadata['server_threads'] = server_workers
+    metadata['execution_count'] = count
+
+
+  samples.append(sample.Sample("transaction rate", tps, "transactions/second", metadata))
+  return samples