Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Update ec2 instance type in unit test workflow file #395

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 12 additions & 5 deletions .github/workflows/unittesting-ci-nvidia.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@ on:
pull_request:
types: [opened, reopened, synchronize]
push:
# TEMPORARILY ADDING TESTING BRANCH. WILL REMOVE.
branches:
- "main"
- "release-**"
- "fix-unit-test-workflow-file"
Comment on lines +16 to +20
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're good with the changes in this file, I will revert these temporary changes


env:
pytest_mark: "fast"
Expand All @@ -26,7 +28,7 @@ jobs:
runs-on: ubuntu-latest
outputs:
label: ${{ steps.start-ec2-runner.outputs.label }}
ec2-instance-id: ${{ steps.start-ec2-runner.outputs.label }}
ec2-instance-id: ${{ steps.start-ec2-runner.outputs.ec2-instance-id }}

steps:
- name: "Harden runner"
Expand All @@ -48,13 +50,15 @@ jobs:
mode: start
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
ec2-image-id: ${{ vars.AWS_EC2_AMI }}
ec2-instance-type: ${{ vars.AWS_REGION }}
# TODO: Update the EC2 instance type to use the EC2 runner variant when GPU calls are mocked
# ec2-instance-type: ${{ env.ec2_runner_variant }}
ec2-instance-type: g4dn.12xlarge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the logic for using this instance type?

subnet-id: subnet-024298cefa3bedd61
security-group-id: sg-06300447c4a5fbef3
iam-role-name: instructlab-ci-runner
aws-resource-tags: >
[
{"Key": "Name", "Value": "instructlab-ci-github-large-runner"},
{"Key": "Name", "Value": "instructlab-ci-github-unit-test-runner"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name wouldn't match the current pruning. It's currently looking for w+ between github and runner. Either need to adjust here or in the pruning.

{"Key": "GitHubRepository", "Value": "${{ github.repository }}"},
{"Key": "GitHubRef", "Value": "${{ github.ref }}"},
{"Key": "GitHubPR", "Value": "${{ github.event.number }}"}
Expand Down Expand Up @@ -104,6 +108,7 @@ jobs:
- name: "Run unit tests with Tox and Pytest"
run: |
. venv/bin/activate
tox -e py3-unit -- -m ${{env.pytest_mark}}
- name: "Show disk utilization AFTER tests"
Expand All @@ -120,6 +125,7 @@ jobs:
uses: step-security/harden-runner@0080882f6c36860b6ba35c610c98ce87d4e2f26f # v2.10.1
with:
egress-policy: audit

- name: "Configure AWS credentials"
uses: "aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502" # v4.0.2
with:
Expand All @@ -128,10 +134,11 @@ jobs:
aws-region: ${{ vars.AWS_REGION }}

- name: "Stop EC2 runner"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be setup so it will always run the way our other ec2-runner jobs are configured.

Ex: https://github.com/instructlab/instructlab/blob/main/.github/workflows/e2e-nvidia-l4-x1.yml#L150
Where it's a separate step that ways runs and depends on the previous step

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why that 'if(always())' step was in there. I'll add that to my PR #409

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe be something leftover from an old job config - a lot of them have gotten out-of-sync, I had to make two hotfixes to the E2E Custom job in the Core repo today. @courtneypacheco has a Dev Doc open around consolidating some of these into common actions to make CI easier to maintain: instructlab/dev-docs#179

id: start-ec2-runner
uses: machulav/ec2-github-runner@1827d6ca7544d7044ddbd2e9360564651b463da2 # v2.3.7
with:
mode: stop
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
label: ${{ needs.start-ec2-runner.outputs.label }}
ec2-instance-type: ${{ env.ec2_runner_variant }}
ec2-instance-id: ${{ needs.start-ec2-runner.outputs.ec2-instance-id }}
# TODO: Update the EC2 instance type to use the EC2 runner variant when GPU calls are mocked
# ec2-instance-type: ${{ env.ec2_runner_variant }}
Loading