[benchmarking] Support loading scenarios from remote repositories #1317

ebeahan · 2023-06-21T19:56:11Z

Building on the work in #1164, add support to elastic-package benchmark system to support loading benchmarking scenarios from remote git repos.

Opens ability to create and maintain benchmarking scenarios outside of elastic/integrations similar to rally-tracks.

The text was updated successfully, but these errors were encountered:

andrewkroh · 2023-10-10T13:09:35Z

What is the use case for this? Asked another way, why store them separately?

ebeahan · 2023-10-10T13:19:36Z

From talking with @marc-gr about this feature, we see the default location for benchmarking configs is alongside integration package in elastic/integrations. But, we also see a need for configs and data sets which should remain private or have little interest to general users. Like with rally tracks, they are stored in another place but still able to run using elastic-package benchmark system --benchmark.

chrisberkhout · 2023-12-20T11:42:27Z

Before this work it was only possible to run a system benchmark:

from the package root
(local manifest files are used to validate data stream settings and generate a package policy)
on the package version in the working tree, or
on another version (named in scenario config) that:
- is available in the package registry
- does not conflict with local manifests for the purposes of data stream validation and policy generation
using a scenario defined in <package root>/_dev/benchmark/system/<scenario>.yml

The metrics from a benchmark run can be saved to an Elasticsearch instance by setting the ELASTIC_PACKAGE_ESMETRICSTORE_* environment variables.

With that functionality it is easy to run the current benchmarks in the package repository on the current package version. If a package's benchmarks are run regularly and the metrics retained, those can be compared over time.

There are two more important use cases:

Maintain benchmark scenarios outside of the package repository
(to allow for scenarios that include non-public datasets)
Run current benchmarks on historical and current versions of a package

These were both already possible with the use of an additional benchmarks repository and the correct manipulation of git history and the working tree (i.e. define scenarios in a benchmarks repository, check out the right version of the package repository for a given run, and link its <package root>/_dev/benchmark/system/ directory to the desired scenario definition before starting the run).

However, there are changes that would make these use cases simpler.

In #1603 I added a --path option to the system benchmark command so that a scenario can be loaded from any local location. This breaks the link between the package version and the benchmark scenario version without having to manipulate the package working tree.

Additional changes that would make system benchmarking more flexible:

Package version choice on the command line
If we want to automate running one version of a scenario on multiple versions of a package it would be useful to choose the version with a command-line argument rather than by modifying the <scenario>.yml file. It would also be good for a scenario to define a minimum version or a range of versions with which it is compatible.
This would do the most to enable new workflows.
No local package repository necessary (use the registry)
Without a local package repository, the manifests used for data stream validation and package policy generation would need to come from the package registry. This would involve checking the available versions, downloading the package zip file to a temporary location and extracting its manifests. Alternatively, it may be possible to use the package information provided by the registry on /package/{name}/{version} instead of the full manifests, since it does include some data stream and policy template information.
This would avoid using current-version manifests to validate and build policies for historical version runs.
No local benchmarks repository necessary (fetch it on demand)
This would involve cloning a remote repository to a temporary location and using scenario definitions from there.
This was requested in this issue, but is now perhaps the lowest priority.

chrisberkhout · 2023-12-20T14:58:50Z

An issue I discovered while looking at system benchmarking functionality:

The Hostname and Port placeholders are not substituted
Some placeholders in variable values, e.g. "http://{{Hostname}}:{{Port}}", don't seem to get substituted as expected. It may be because the substitution is done too early, before the service has been started.

ebeahan added the Team:Security-External Integrations label Jun 21, 2023

ebeahan assigned chrisberkhout Jun 22, 2023

chrisberkhout mentioned this issue Dec 14, 2023

Add --path option for loading system benchmark scenarios from elsewhere #1603

Merged

narph added Team:Security-Scalability Security Integrations Scalability Team and removed Team:Security-External Integrations labels Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmarking] Support loading scenarios from remote repositories #1317

[benchmarking] Support loading scenarios from remote repositories #1317

ebeahan commented Jun 21, 2023

andrewkroh commented Oct 10, 2023 •

edited

Loading

ebeahan commented Oct 10, 2023

chrisberkhout commented Dec 20, 2023

chrisberkhout commented Dec 20, 2023

[benchmarking] Support loading scenarios from remote repositories #1317

[benchmarking] Support loading scenarios from remote repositories #1317

Comments

ebeahan commented Jun 21, 2023

andrewkroh commented Oct 10, 2023 • edited Loading

ebeahan commented Oct 10, 2023

chrisberkhout commented Dec 20, 2023

chrisberkhout commented Dec 20, 2023

andrewkroh commented Oct 10, 2023 •

edited

Loading