Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a cron job that calls all performance harness on cron #31480

Merged
merged 15 commits into from
Oct 20, 2023
69 changes: 46 additions & 23 deletions .github/workflows/connector-performance-command.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,30 @@ on:
dataset:
type: string
required: true
repo:
description: "Repo to check out code from. Defaults to the main airbyte repo. Set this when building connectors from forked repos."
type: string
required: false
default: "airbytehq/airbyte"
gitref:
description: "The git ref to check out from the specified repository."
type: string
required: false
default: master
uuid:
description: "Custom UUID of workflow run. Used because GitHub dispatches endpoint does not return workflow run id."
type: string
required: false
stream-number:
description: "Number of streams to use for destination performance measurement."
type: string
required: false
default: "1"
sync-mode:
description: "Sync mode to use for destination performance measurement."
required: false
type: string
default: "full_refresh"
workflow_dispatch:
inputs:
connector:
Expand Down Expand Up @@ -51,7 +75,7 @@ jobs:
timeout-minutes: 10
runs-on: ubuntu-latest
steps:
- name: UUID ${{ github.event.inputs.uuid }}
- name: UUID ${{ inputs.uuid }}
run: true
start-test-runner:
name: Start Build EC2 Runner
Expand All @@ -65,8 +89,8 @@ jobs:
- name: Checkout Airbyte
uses: actions/checkout@v3
with:
repository: ${{ github.event.inputs.repo }}
ref: ${{ github.event.inputs.gitref }}
repository: ${{ inputs.repo }}
ref: ${{ inputs.gitref }}
- name: Check PAT rate limits
run: |
./tools/bin/find_non_rate_limited_PAT \
Expand All @@ -85,37 +109,36 @@ jobs:
runs-on: ${{ needs.start-test-runner.outputs.label }}
steps:
- name: Link comment to workflow run
if: github.event.inputs.comment-id
if: inputs.comment-id
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: ${{ github.event.inputs.comment-id }}
comment-id: ${{ inputs.comment-id }}
body: |
#### Note: The following `dataset=` values are supported: `1m`<sub>(default)</sub>, `10m`, `20m`,
`bottleneck_stream1`, `bottleneck_stream_randomseed. For destinations only: you can also use `stream-numbers=N`
to simulate N number of parallel streams. Additionally, `sync-mode=incremental` is supported for destinations.
For example: `dataset=1m stream-numbers=2 sync-mode=incremental`
> :runner: ${{github.event.inputs.connector}} https://github.com/${{github.repository}}/actions/runs/${{github.run_id}}.
> :runner: ${{inputs.connector}} https://github.com/${{github.repository}}/actions/runs/${{github.run_id}}.
- name: Search for valid connector name format
id: regex
uses: AsasInnab/regex-action@v1
with:
regex_pattern: "^((connectors|bases)/)?[a-zA-Z0-9-_]+$"
regex_flags: "i" # required to be set for this plugin
search_string: ${{ github.event.inputs.connector }}
search_string: ${{ inputs.connector }}
- name: Validate input workflow format
if: steps.regex.outputs.first_match != github.event.inputs.connector
if: steps.regex.outputs.first_match != inputs.connector
run: echo "The connector provided has an invalid format!" && exit 1
- name: Filter supported connectors
if:
"${{ github.event.inputs.connector != 'connectors/source-postgres' &&
github.event.inputs.connector != 'connectors/source-mysql' &&
github.event.inputs.connector != 'connectors/destination-snowflake' }}"
if: "${{ inputs.connector != 'connectors/source-postgres' &&
inputs.connector != 'connectors/source-mysql' &&
inputs.connector != 'connectors/destination-snowflake' }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure that mongodb from the other PR doesn't get run over during merge

run: echo "Only connectors/source-postgres, source-mysql and destination-snowflake currently supported by harness" && exit 1
- name: Checkout Airbyte
uses: actions/checkout@v3
with:
repository: ${{ github.event.inputs.repo }}
ref: ${{ github.event.inputs.gitref }}
repository: ${{ inputs.repo }}
ref: ${{ inputs.gitref }}
fetch-depth: 0 # This is to fetch the main branch in case we are running on a different branch.
- name: Install Java
uses: actions/setup-java@v3
Expand All @@ -135,7 +158,7 @@ jobs:
- name: Source or Destination harness
id: which-harness
run: |
the_harness="$(echo ${{github.event.inputs.connector}} | sed 's/.*\///; s/-.*//')"-harness
the_harness="$(echo ${{inputs.connector}} | sed 's/.*\///; s/-.*//')"-harness
echo "harness_type=$the_harness" >> "$GITHUB_OUTPUT"
- name: Write harness credentials
run: |
Expand All @@ -155,9 +178,9 @@ jobs:
- name: build connector
shell: bash
run: |
echo "Building... ${{github.event.inputs.connector}}" >> $GITHUB_STEP_SUMMARY
echo "Building... ${{inputs.connector}}" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY # this is a blank line
connector_name=$(echo ${{ github.event.inputs.connector }} | cut -d / -f 2)
connector_name=$(echo ${{ inputs.connector }} | cut -d / -f 2)
echo "Running ./gradlew :airbyte-integrations:connectors:$connector_name:build -x check"
./gradlew :airbyte-integrations:connectors:$connector_name:build -x check
env:
Expand All @@ -170,10 +193,10 @@ jobs:
id: run-harness
shell: bash
env:
CONN: ${{ github.event.inputs.connector }}
DS: ${{ github.event.inputs.dataset }}
STREAM_NUMBER: ${{ github.event.inputs.stream-number }}
SYNC_MODE: ${{ github.event.inputs.sync-mode }}
CONN: ${{ inputs.connector }}
DS: ${{ inputs.dataset }}
STREAM_NUMBER: ${{ inputs.stream-number }}
SYNC_MODE: ${{ inputs.sync-mode }}
PREFIX: '{"type":"LOG","log":{"level":"INFO","message":"INFO i.a.i.p.PerformanceTest(runTest):165'
SUFFIX: '"}}'
HARNESS_TYPE: ${{ steps.which-harness.outputs.harness_type }}
Expand All @@ -200,11 +223,11 @@ jobs:
kubectl logs --tail=1 $POD | while read line ; do line=${line#"$PREFIX"}; line=${line%"$SUFFIX"}; echo $line >> $GITHUB_OUTPUT ; done
echo "$EOF" >> $GITHUB_OUTPUT
- name: Link comment to workflow run
if: github.event.inputs.comment-id
if: inputs.comment-id
uses: peter-evans/create-or-update-comment@v2
with:
reactions: "+1"
comment-id: ${{ github.event.inputs.comment-id }}
comment-id: ${{ inputs.comment-id }}
body: |
## Performance test Result:
```
Expand Down
33 changes: 10 additions & 23 deletions .github/workflows/connector-performance-cron.yml
Original file line number Diff line number Diff line change
@@ -1,34 +1,21 @@
name: Connector Performance Harness Cron
on: workflow_dispatch
on:
schedule:
# * is a special character in YAML so you have to quote this string
- # 5:30 and 17:30
- cron: "30 5,17 * * *"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workflow_dispatch: # for manual triggers

jobs:
postgres-1m-run:
uses: ./.github/workflows/connector-performance-command.yml
with:
connector: connectors/source-postgres
connector: "connectors/source-postgres"
dataset: 1m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all dataset going to run every time? is this needed?

Copy link
Contributor Author

@xiaohansong xiaohansong Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually that's something I want to discuss with you. When we are going to support and test against full_refresh and incremental, and with additional mongoDB support, we will have maximum 18 combinations between connectors/dataset/sync mode. I feel we do not need to enumerate all datasets - what is the original purpose of defining 3 different datasets for performance testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think 1m is giving us something that's very different than 10 or 20m.
I'm ok with leaving other datasets for manual testing where you may want to check a more specialized cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. removed 10m/20m for both connectors.

postgres-10m-run:
uses: ./.github/workflows/connector-performance-command.yml
with:
connector: connectors/source-postgres
dataset: 10m
postgres-20m-run:
uses: ./.github/workflows/connector-performance-command.yml
with:
connector: connectors/source-postgres
dataset: 20m
secrets: inherit
mysql-1m-run:
uses: ./.github/workflows/connector-performance-command.yml
with:
connector: connectors/source-mysql
connector: "connectors/source-mysql"
dataset: 1m
mysql-10m-run:
uses: ./.github/workflows/connector-performance-command.yml
with:
connector: connectors/source-mysql
dataset: 10m
mysql-20m-run:
uses: ./.github/workflows/connector-performance-command.yml
with:
connector: connectors/source-mysql
dataset: 20m
secrets: inherit
Loading