Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[benchmarking] Support loading scenarios from remote repositories #1317

Open
ebeahan opened this issue Jun 21, 2023 · 4 comments
Open

[benchmarking] Support loading scenarios from remote repositories #1317

ebeahan opened this issue Jun 21, 2023 · 4 comments
Assignees
Labels
Team:Security-Scalability Security Integrations Scalability Team

Comments

@ebeahan
Copy link
Member

ebeahan commented Jun 21, 2023

Building on the work in #1164, add support to elastic-package benchmark system to support loading benchmarking scenarios from remote git repos.

Opens ability to create and maintain benchmarking scenarios outside of elastic/integrations similar to rally-tracks.

@andrewkroh
Copy link
Member

andrewkroh commented Oct 10, 2023

What is the use case for this? Asked another way, why store them separately?

@ebeahan
Copy link
Member Author

ebeahan commented Oct 10, 2023

From talking with @marc-gr about this feature, we see the default location for benchmarking configs is alongside integration package in elastic/integrations. But, we also see a need for configs and data sets which should remain private or have little interest to general users. Like with rally tracks, they are stored in another place but still able to run using elastic-package benchmark system --benchmark.

@chrisberkhout
Copy link
Contributor

Before this work it was only possible to run a system benchmark:

  • from the package root
    (local manifest files are used to validate data stream settings and generate a package policy)
  • on the package version in the working tree, or
    on another version (named in scenario config) that:
    • is available in the package registry
    • does not conflict with local manifests for the purposes of data stream validation and policy generation
  • using a scenario defined in <package root>/_dev/benchmark/system/<scenario>.yml

The metrics from a benchmark run can be saved to an Elasticsearch instance by setting the ELASTIC_PACKAGE_ESMETRICSTORE_* environment variables.

With that functionality it is easy to run the current benchmarks in the package repository on the current package version. If a package's benchmarks are run regularly and the metrics retained, those can be compared over time.

There are two more important use cases:

  • Maintain benchmark scenarios outside of the package repository
    (to allow for scenarios that include non-public datasets)
  • Run current benchmarks on historical and current versions of a package

These were both already possible with the use of an additional benchmarks repository and the correct manipulation of git history and the working tree (i.e. define scenarios in a benchmarks repository, check out the right version of the package repository for a given run, and link its <package root>/_dev/benchmark/system/ directory to the desired scenario definition before starting the run).

However, there are changes that would make these use cases simpler.

In #1603 I added a --path option to the system benchmark command so that a scenario can be loaded from any local location. This breaks the link between the package version and the benchmark scenario version without having to manipulate the package working tree.

Additional changes that would make system benchmarking more flexible:

  1. Package version choice on the command line
    If we want to automate running one version of a scenario on multiple versions of a package it would be useful to choose the version with a command-line argument rather than by modifying the <scenario>.yml file. It would also be good for a scenario to define a minimum version or a range of versions with which it is compatible.
    This would do the most to enable new workflows.
  2. No local package repository necessary (use the registry)
    Without a local package repository, the manifests used for data stream validation and package policy generation would need to come from the package registry. This would involve checking the available versions, downloading the package zip file to a temporary location and extracting its manifests. Alternatively, it may be possible to use the package information provided by the registry on /package/{name}/{version} instead of the full manifests, since it does include some data stream and policy template information.
    This would avoid using current-version manifests to validate and build policies for historical version runs.
  3. No local benchmarks repository necessary (fetch it on demand)
    This would involve cloning a remote repository to a temporary location and using scenario definitions from there.
    This was requested in this issue, but is now perhaps the lowest priority.

@chrisberkhout
Copy link
Contributor

An issue I discovered while looking at system benchmarking functionality:

The Hostname and Port placeholders are not substituted
Some placeholders in variable values, e.g. "http://{{Hostname}}:{{Port}}", don't seem to get substituted as expected. It may be because the substitution is done too early, before the service has been started.

@narph narph added Team:Security-Scalability Security Integrations Scalability Team and removed Team:Security-External Integrations labels Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Security-Scalability Security Integrations Scalability Team
Projects
None yet
Development

No branches or pull requests

4 participants