Skip to content

Latest commit

 

History

History
457 lines (450 loc) · 25.1 KB

CHANGES.next.md

File metadata and controls

457 lines (450 loc) · 25.1 KB

Breaking changes:

  • Added --accept_licenses flag. User have to turn this flag on to acknowledge that PKB may install software thereby accepting license agreements on the user's behalf.
  • Renamed Database-related flags from managed_db* to db* Added alias for backwards compatibility, might not be supported in the future release.
  • Require Python 3.9+
  • The core_os os_type was removed from the Azure provider as the image was deleted,
    • It will be replaced by Fedora Core OS if a public image is made available.
  • The dpb_sparksql_benchmark now requires passing the requested queries with --dpb_sparksql_query_order
  • AwsVirtualMachine.IMAGE_OWNER has been changed from a string to a list of strings to support images which have multiple owners e.g. AmazonLinux2 in opt-in regions.
  • Remove Ubuntu1710 from --os_types.
  • Remove Amazon Linux 1 from --os_types.
  • Changed redis_memtier_benchmark to use redis version 6 and above. Redis versions less than 6 are no longer supported.
  • Compressed redis_memtier samples from --memtier_time_series into a few time-to-value dictionaries, greatly reducing the number of samples produced
  • Make Ubuntu 18 the default os_type.
  • Deprecate Ubuntu 16 as it is EOL on 2021-05-01.
  • Switch to Azure CLI to MSAL. This requires updating the CLI to >= 2.30.0.
  • Remove deprecated --eks_zones flags. Use --zones instead.
  • Deprecate CentOS Linux 8 as it is EOL on 2021-12-31.
  • --zones and --extra_zones deprecated in favor of --zone.
  • Deprecate Aerospike_YCSB benchmark.
  • Remove pkb's --placement_group_style cloud-agnostic values 'cluster'/ 'cluster_if_supported'/'spread'/'spread_if_supported'.
  • Replace flag --ibm_azone with --ibm_region.
  • Changed the default benchmark to cluster_boot instead of the standard set. This makes the default behavior for PKB much faster and the standard set of benchmarks was defined many years ago. It's not a reasonable introduction to PKB or something that most people should run by default.
  • --dpb_export_job_stats is now False by default.
  • Validate arguments to IssueCommand & RobustRemoteCommand. Replaced force_info_log & suppress_warning parameters with vm_command_log_mode flag, added should_pre_log parameter. Passed stacklevel variable to logging to better distinguish between RemoteCommand call sites. See stacklevel docs: https://docs.python.org/3/library/logging.html#logging.Logger.debug
  • Remove Dataflow parameter --maxNumWorkers by default and add dataflow_max_worker_count in spec to allow users to set this parameter on their own.
  • Remove flag fio_write_against_multiple_clients from FIO.
  • Remove flag benchmark_compatibility_checking.
  • Drop windows coremark benchmark.
  • Remove cudnn linux package.
  • Make Ubuntu 20 the default os_type.
  • Default --ip_addresses to INTERNAL.
  • Add multichase_benchmark flag defaults.
  • For hammerdbcli_benchmark, set default num_warehouses=25num_cpus and num_vu=2num_cpus.
  • Remove old spark_benchmark and hadoop_terasort_benchmark and related services. Prefer to use the newer dpb_generic_benchmark and dpb_terasort_benchmark.
  • gcp/aws/azure_provisioned_iops/throughput flags are unified to provisioned_iops/throughput flags.
  • aws_dynamodb_ycsb benchmark now requires an explicit --aws_dynamodb_ycsb_cli_profile flag to select the credentials to talk to YCSB.
  • Benchmarks that require EC2 client VMs now require --aws_ec2_instance_profile to configure VM permissions instead of installing local AWS CLI credentials via aws_credentials.py. This must be set up by the user beforehand. See AWS documentation

New features:

  • Add support for systems running fedora36 and fedora37
  • Add support for AlloyDB on GCP
  • Add support for static systems running debian11
  • Add ibmcloud as a new provider.
  • Add prefix/directory support for object storage service runs.
  • Add MaskRCNN and ReXtNet-101 to the horovod benchmark.
  • Add Bigtable Benchmarking tutorial.
  • Add ycsb_skip_run_stage argument.
  • Move PKB static website files from gh-pages branch to master branch.
  • Add UDP benchmarking to iPerf.
  • Add ability to use Intel package repos.
  • Add Ubuntu 20.04 to AWS, Azure, and GCP providers.
  • Add support for running DPB Apache Spark benchmarks on PKB provisioned VMs
  • Add support for AWS gp3 disks.
  • Add ability to use Intel compiled HPCC binaries with --hpcc_use_intel_compiled_hpl
  • Add OSU MPI micro-benchmarks benchmark
  • Add support for setting virtual NIC type for GCP VMs.
  • Add ability to collect /proc/meminfo data with --collect_meminfo
  • Add support for setting egress bandwidth tier for GCP VMs.
  • Add support for Azure Ultra Disk
  • Add cloudharmony_network benchmark
  • Add GPU PingPong benchmark.
  • Added lmod linux package.
  • Support IntelMPI 2021.x on RedHat.
  • Add TensorFlow BigQuery Connector benchmark.
  • Add gcsfuse benchmark.
  • Add MLPerf multiworker benchmark based on NVIDIA's v0.6 submission
  • Add retries to PKB.
  • Added GCE networking MTU support with --mtu.
  • Add pd extreme support to PKB.
  • Add '--delete_samples' to measure VM deletion during benchmark teardown phase
  • Add cloudharmony iperf benchmark to pkb.
  • Add specjbb2015 benchmark to PKB.
  • Add VM stop start benchmark.
  • Add suspend_resume benchmark.
  • Add AWS support for VM stop start benchmark.
  • Add Azure support for VM stop start benchmark.
  • Add --os_type=debian11 support for GCP, AWS and Azure Providers.
  • Add cURL benchmark for object storage.
  • Add vbench video encoding benchmark to PKB.
  • Add Kubernetes based DPB Service for Spark
  • Add support for creating Dataproc cluster on GKE
  • Add support for TPC-DS/H benchmarks on Dataproc Serverless.
  • Add support for TPC-DS/H benchmarks on AWS Glue Job.
  • Add messaging service latency benchmark (for GCP PubSub, AWS SQS & Azure Service Bus).
  • Add Intel perfspect as a new trace
  • Add Ubuntu 22.04 support for GCP, AWS, and Azure Providers.
  • Add support for Rocky Linux 8 and 9 on GCP, AWS, and Azure Providers.
  • Add support for CentOS Stream 8, CentOS Stream 9 on GCP and AWS Providers.
  • Add support for chbench using s64da.
  • Add sysbench_memory benchmark.
  • Add support for RHEL 9 on AWS, Azure, and GCP.
  • Add GCP optimized Rocky Linux 8 and 9 OSes.
  • Add mtu to os_metadata in linux_virtual_machine.
  • Add support for TPC-DS/H benchmarks on AWS EMR Serverless.
  • Add Intel MPI benchmark.
  • Add support for Azure ARM VMs.
  • Add an HTTP endpoint polling utility & incorporate it into app_service.
  • Added support for dynamic provisioning of Bigquery flat rate slots at benchmark runtime
  • Create a new subdirectory of linux_benchmarks called provisioning_benchmarks for benchmarking lifecycle management timings of cloud resources. Including:
    • Kubernetes Clusters
    • KMS cryptographic keys
    • Object storage buckets
  • Add support for using the hbase2 binding in the Cloud Bigtable YCSB benchmark.
  • Add iPerf interval reporting.
  • Add support for DynamoDB on demand instances.
  • Add support for Debian 10 & 11 with backported kernels on AWS.
  • Add fio_netperf benchmark, which executes the run stages of fio and netperf benchmarks in parallel using the first 2 VM's in benchmark_spec.
  • Add Google Kubernetes Engine based DPB service to run flink benchmarks.
  • Add support for Amazon Linux 2023.
  • Add support for multi-network creation/attachment. PKB currently does not handle subnet creation on an existing network.
  • Add support for GCE Confidential VM's.
  • Add cos-dev, cost109, cos105, and cos101 OS support for GCP.
  • Add --object_ttl_days flag for lifecycle management of created buckets.
  • Add support for multi-NIC netperf throughput on AWS.
  • Added AWS/GCP support for Data Plane Development Kit (DPDK) on Linux VM's to improve networking performance, as well as a DPDK benchmark for testpmd.
  • Add --dpb_hardware_hourly_cost and --dpb_service_premium_hourly_cost to
  • Add support for Spanner Postgres benchmarking with TPCC. estimate cost of DPB service in benchmark runs.
  • Add --dpb_dynamic_allocation flag to disable dynamic allocation in Spark benchmarks.
  • Add support for benchmarking VMs with Local SSD on Windows on GCE/AWS/Azure.
  • Add support for Debian 12.
  • Add netperf_hammerdbcli benchmark, which executes netperf and hammerdbcli benchmarks in parallel.
  • Add default_benchmark_config.yaml and merge it with user_config. This is done before config overrides. Within default_benchmark_config.yaml, add configs for netperf_tcp_rr, netperf_tcp_stream, and hammerdbcli_mysql.
  • Add support for Ubuntu 23.10.
  • HammerDB enables "Use All Warehouses" by default for increased I/O.

Enhancements:

  • Added delay_time support for delete operations in object storage service.
  • Added horovod_synthetic option for synthetic input data in ResNet/ReXtNet models.
  • Added support for A100 GPUs.
  • Added cuda_tookit 11.0 support.
  • Updated retry_on_rate_limited to false on all cluster_boot runs.
  • Add ability to apply HPC optimized script to GCE VMs with --gce_hpc_tools
  • Update the nginx benchmark.
  • Added --before_run_pause flag for debugging during benchmark development.
  • Report CPU vulnerabilities as a sample via the --record_cpu_vuln flag.
  • Add retry when calling MountDisk.
  • Measure time to running status of a VM in large_scale_boot benchmark.
  • SpecCPU2017 runs on Redhat, Centos7
  • Aerospike YCSB can have multiple server nodes and different client/server machine types.
  • HPL's configuration file customizable with flags.
  • Added Intel MPI to linux_packages.
  • Added ability to use MKL 2018.2 from Intel repos via --mkl_install_from_repo
  • Added ability to install gfortran 9 with --fortran_version=9
  • Added MSS support to netperf with --netperf_mss.
  • Added nfs_service.NfsExport(vm, path) to easily NFS export a directory.
  • AWS EFA works for Ubuntu1604.
  • Added support for MySQL 8.0 on VMs and minimal innodb tuning.
  • Add ability to specify version of Intel MKL with --mkl_version
  • Added intelmpi.NfsExportIntelDirectory to NFS export /opt/intel
  • Modify cloud_datastore_ycsb benchmark to execute YCSB on the same db entries each run instead of emptying and preloading the db every time. Can set flag google_datastore_repopulate=True to empty & repopulate.
  • Enhance the cpu metric collection feature for cloud bigtable benchmark: as long as there is a real workload (table loading is not counted), the cpu metrics will be collected.
  • Support wiring properties into DPB clusters with --dpb_clusters_properties in addition to --dpb_job_properties.
  • Add support for GCS and S3 I/O in PKB managed Spark and Hadoop clusters.
  • Update FIO workload to support extraction of common benchmark parameters from the scenario string.
  • Added Intel oneAPI BaseKit to packages.
  • Upgrade default CUDA version to 11.0.
  • Add support for AWS IO2 EBS instances.
  • Add support for Postgres 13 on VMs.
  • Add support to compile GCC versions on non Debian systems.
  • Add additional customization options to SPEC CPU 2017.
  • Add average utilization to the metadata of cpu metrics for Cloud Bigtable benchmark.
  • Add support for NFS nconnect mount option.
  • Add support for custom compilation of OpenJDK.
  • Add support for configuring the HBase binding to use with --hbase_binding.
  • Enhance the cpu utilization metrics for cloud bigtable ycsb benchmark: collect the cpu data for each workload (differentiated by workload_index); the time window estimation is also more accurate.
  • Support default subnets in --aws_subnet with default as value passed.
  • Add Unsupported config failure substatus for runs that are not supported by the cloud.
  • Add support for nodepools to container_cluster benchmark spec.
  • Expose GCS FUSE disk type to allow using GCS buckets as a data_disk.
  • Add support for 5th gen Azure VMs.
  • Support multiple Redis instances on the same VM and multiple client VMs.
  • Support creating autoscaled Bigtable instances.
  • Added support to allow deleting a static table in Cloud Bigtable benchmarks via --google_bigtable_delete_static_table.
  • Support downloading data twice in object_storage_service_benchmark.
  • Add memtier reported percentile latencies to Memtier samples metadata.
  • Add support for building multiarch Docker images.
  • Add latency capped throughput measurement mode to memtier.
  • Add Unsupported config failure substatus for Azure runs.
  • Add support for Windows 2022 and Sql server 2019 on Windows 2022
  • Add support for Redis Enterprise clustered database.
  • Support regional GKE clusters.
  • Support zonal node placement in regional Kubernetes clusters.
  • Uses the regular version of gcloud for Bigtable commands instead of beta.
  • Support logging the start timestamp of each stage.
  • Support for building GCC on Debian
  • Support for Postgres 13 on Debian
  • Add OSProvisioningTimedOut as a recognized failure mode in Azure.
  • Add support for providing initialization actions into DPB clusters with --dpb_initialization_actions
  • Add sandbox configuration fields for GKE nodepools.
  • Fetch Redis benchmark live migration times from GCP metadata server.
  • Retry table deletions in Cloud Bigtable benchmarks against a user managed instance and fail if the benchmark eventually fails to delete a table.
  • Add support for Snowflake on Azure External Tables
  • Fetch memtier benchmark runtime information
  • Support installing the Google Cloud Bigtable client by a given version via --google_bigtable_client_version and simplify dependency management.
  • Support setting --dpb_dataflow_additional_args and --dpb_dataflow_timeout for dpb_dataflow_provider.
  • Add support for T2A (ARM) VMs on GCE.
  • Add --dpb_job_poll_interval_secs flag to control job polling frequency in DPB benchmarks.
  • Add support for more readings in nvidia_power tracking.
  • Report benchmark run costs for dpb_sparksql_benchmark runs on Dataproc Serverless, AWS EMR Serverless & AWS Glue.
  • Create a list of resources in benchmark_spec to extract common lifecycle timing samples from regardless of benchmark. The set is initially small, but can be expanded to any resource.
  • Add per-VM resource metadata for id, name, and IP address.
  • Add Key Management Service (KMS) resource for cloud cryptographic keys.
  • Add support for using java veneer client with google bigtable google_bigtable_use_java_veneer_client.
  • Allow configuring the number of channels used per VM for the Cloud Bigtable YCSB benchmark with --google_bigtable_channel_count.
  • Add --pkb_log_bucket flag, allowing users to route PKB logs to a GCS bucket and clean up space on their machines.
  • Add support for rls routing with direct path with new flag google_bigtable_enable_rls_routing.
  • Set default YAML config vm_spec.GCP_network_name to null, and added the corresponding attribute to GCEVMSpec, GCENetworkSpec and GCEVirtualMachine. vm_spec overrides FLAGS.gce_network_name.
  • Add --dpb_sparksql_queries_url flag to provide custom object store path (i.e. GCS/S3) where the queries will be used for dpb_sparksql_benchmark.
  • Add --gke_node_system_config flag to the GKE provider for passing kubelet and linux parameters.
  • Add 'Time to Create' and 'Time to Running' samples on cluster_boot for GCEVirtualMachine and AWSVirtualMachine instances that are provisioned with asynchronous 'create' invocations.
  • Add --dpb_sparksql_streams to run TPC-DS/H throughput runs.
  • Add --gce_create_log_http to pass --log-http to gcloud compute instance create and gcloud compute operations describe.
  • Update AWS/Azure/GCP data disks to use cheap ssds rather than hdds.
  • Support Azure ZRS disks and hyperdisk balanced.
  • Add --create_container_cluster_time_resize option to time adding node to provisioned Kubernetes clusters.
  • Removed --container_cluster_cloud & --cloud=Kubernetes. Now to run VM benchmarks on Kubernetes, just set --cloud=GCP (or whichever) and --vm_platform=Kubernetes.
  • Added function under perfkitbenchmarker.publisher to de-serialize labels from a string.
  • Add support for S3 Express One Zone buckets with --object_storage_zone.
  • Add --always_call_cleanup flag for runs that need to run Cleanup, but may fail in Provision.
  • Add support for setting an app profile to use on an existing instance via --google_bigtable_app_profile_id for Cloud Bigtable YCSB benchmarks.

Bug fixes and maintenance updates:

  • Add 'runcpu --update' and 'runcpu --version' commands to install phase.
  • Set the command to download preprovisioned data to be robust and have a five minute timeout.
  • Make Speccpu17 fail if there are compilation errors that will cause missing results.
  • Tuned entity cleanup parameters.
  • Fix wrong unit in MLPerf benchmark.
  • Disabled TF_CUDNN_USE_AUTOTUNE for horovod benchmarks.
  • Updated default NCCL version.
  • Use a deletion task to cleanup leftover entities to avoid hitting batch limit.
  • Updated object storage service deleteObjects API to also return back object_sizes of deleted objects.
  • Fixed a bug in leftover entity deletion logic.
  • Fixed string encoding bug in netperf_pps benchmark.
  • Fix installing RedHat EPEL on GCP (rhel8,centos8) and AWS (rhel7,rhel8, centos8, amazonlinux2)
  • Fixed a bug in leftover entity deletion logic.
  • Use absl parameterized test case.
  • Error out if AWS EFA install fails (centos8,rhel8)
  • Added cascadelake as a --gcp_min_cpu_platform option.
  • Move CoreOS from EOL Container Linux CoreOS to Fedora CoreOS on AWS and GCP.
  • Update files to use absl/flags.
  • Tries alternate command to get boot time, fixing continuous checking for new time on rebooted system.
  • Correctly handle ARM processors in netperf.
  • Added --always_teardown_on_exception to allow pkb to perform teardown when there is exception at the provision|prepare|run|cleanup stage.
  • Updates crcmod, boto, and awscli installation to pip3.
  • Consolidates adding Ubuntu toolchain repo.
  • Moved stress_ng installation to a package.
  • Switch to using Google Cloud Build for continuous integration.
  • Fix PrettyPrintStreamPublisher to make "cpu_utilization_per_minute" show up in the PKB results summary for cloud bigtable benchmark.
  • Added an option to install GCP NCCL plugins.
  • Updated hbase binding (from hbase10 to hbase12) for cloud bigtable ycsb benchmark and hbase ycsb benchmark.
  • Added retries around spurious InvalidPlacementGroup.InUse on AWS VM create.
  • Support Redis version 6 on managed Redis datastores.
  • Updated Aerospike server release version to 4.9.0.31.
  • Updated Aerospike client release version to 4.6.21.
  • Install zlib1g-dev dependency for Aerospike client.
  • Use OMB version 4.7.1 and parse new min/max columns.
  • Added --application_default_credential_file as an alternative way to authenticate with Bigquery.
  • Only install pip and pip3 when they are not already installed with Python.
  • Install pip and pip3 from get-pip.py to avoid issues with old packages.
  • Fix parsing Dstat 0.7.3
  • Update hadoop version to 3.3.1
  • Updated required numpy and six versions.
  • Added --hadoop_bin_url flag to allow overrides for Hadoop downloads.
  • Make RunBenchmark handle KeyboardInterrupt so that benchmark specific resources can be cleaned up on cancellation. Expose these errors via status.
  • Added --ycsb_fail_on_incomplete_loading flag to allow the test to fail fast in the case of table loading failures. --ycsb_insert_error_metric can be used to determine which metric indicates that loading failed (defaults to 'insert Return=ERROR').
  • Enable the aggregation for "Return=NOT_FOUND" errors.
  • Added no_proxy flag for proxy settings
  • Stop attempting to delete PKB resources that failed to create.
  • Added a new user guide for bigtable walkthrough.
  • Sanitize the shell code in bigtable walkthrough doc: removing dollar sign and using variable expansion.
  • Added --google_monitoring_endpoint flag for querying a different endpoint than monitoring.googleapis.com. Used by cloud_bigtable_ycsb.
  • Update Go language binary to version 1.17.2
  • Broadens Azure quota detection parsing
  • AWS disk attaches now wait for attach, supporting io2 block express
  • Update the performance results of Bigtable testing which used a more proper client setup.
  • Update the runner's AWS CLI to 1.19.75.
  • Upgrade from AWS ecr get-login to ecr get-login-password.
  • Minor fix of the Bigtable benchmarking user guide.
  • Enable icelake and milan as --gcp_min_cpu_platform options.
  • Update the bigtable tutorial readme with the content of batch_testing.md. Unneeded files are removed.
  • Fix fio_write_against_multiple_clients additional samples and metadata.
  • Use real URLs as the links in Bigtable walkthrough doc.
  • Add option to publish to a subfolder in cloud storage publisher.
  • Parse resulting output matrix by indexing from the bottom up instead of top down.
  • Double build time for all cloud's docker images, for a more complex build script.
  • Add required dataflow option --gcpTempLocation and --region to gcp_dpb_dataflow provider.
  • Support taking FLAGS.dpb_jar_file and FLAGS.dpb_wordcount_additional_args when running wordcount benchmark.
  • Add some required types to BaseAppServiceSpec.
  • Uses nic type of GVNIC by default (instead of VIRTIO_NET) on GCE
  • Rename pkb's --placement_group_style values to reflect their cloud-specific CLI arguments (GCP - 'COLLOCATED'/'AVAILABILITY-DOMAIN'; AWS - 'cluster'/'spread'/'partition'; Azure - 'proximity-placement-group'/'availability-set'). Cloud-agnostic value 'closest_supported' will choose the most tightly-coupled placement policy supported.
  • Fix how the CBT client is installed for the cloud_bigtable_ycsb_benchmark (when --google_bigtable_client_version is set) and use the cbt CLI instead of the hbase shell to create and delete tables.
  • Update Bigtable benchmarking configs along with new docker image release. Important dates are added to the user guide.
  • Add --assign_external_ip flag to allow benchmarking VMs without creating external (public) IPs for better security and reduced costs on AWS, Azure, and GCP. The --connect_via_internal_ip flag should also be used in this case.
  • Add --boot_completion_ip_subset flag to determine how to measure Boot Completion
  • Add --azure_subnet_id flag to use an existing subnet instead of creating a new one.
  • Remove --google_bigtable_enable_table_object_sharing. Use --ycsb_tar_url=https://storage.googleapis.com/cbt_ycsb_client_jar/ycsb-0.14.0.tar.gz to retain the previous behavior.
  • Remove --google_bigtable_hbase_jar_url. Rely on --google_bigtable_client_version instead.
  • Fix how environment variable is set for direct path
  • Fix incorrect string concatenation causing Snowflake Throughput runs to fail.
  • Remove reboot after changing sysctl and load via sysctl -p instead.
  • Clean up settings related to the Cloud Bigtable Veneer client.
  • Don't install HBase dependencies when using the Cloud Bigtable Veneer client.
  • Require monitoring.write scope for client side metrics when using the Cloud Bigtable veneer client.
  • Add flag dpb_job_type and support running native flink pipeline on dataproc_flink.
  • Cleanup Coremark compiling flags.
  • Remove cygwin codepath.
  • Moved flags from pkb.py to flags.py to help avoid circular dependencies.
  • Updated tracer dstat to use pcp dstat.
  • Removed Windows 2012 after loss of support on all clouds.
  • Formatted entire directory with https://github.com/google/pyink.
  • Added a new flag --azure_attach_disk_with_create (default=True) to enable/disable attach of disks to VM as a part of Disk creation for Azure.
  • Using the flag --gcp_create_disks_with_vm=false in provision_disk benchmark to separate disk creation from VM creation in GCP and get the disk create and attach time.
  • Reduce duplicate code in MaintenanceEventTrigger.AppendSamples().
  • Attach "run_number" label to "LM Total Time" sample.
  • Refactored azure_virtual_machine.py to use azure_disk_strategies.py and Enabled disk provision benchmark for Azure.
  • Refactored aws_virtual_machine.py to use aws_disk_strategies.py and Enabled disk provision benchmark for AWS by using --aws_create_disks_with_vm.
  • Enabled parallel/bulk create and attach of GCE, AWS and Azure remote disks.
  • Set --always_call_cleanup=True flag as the default for cluster_boot. This prevents leaking tcpdump processes from runs that fail in the Provision phase.
  • Test change.