Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add reusable workflow to build openedx images #1

Merged
merged 28 commits into from
Sep 4, 2024
Merged
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
e8aecee
feat: add first version of picasso builder action
mariajgrimaldi Jul 29, 2024
a934cdf
refactor: comment repo deployment key for now
mariajgrimaldi Jul 29, 2024
b3f0c8b
Revert "refactor: comment repo deployment key for now"
mariajgrimaldi Jul 29, 2024
1fdf90e
fix: install tvm before using it
mariajgrimaldi Jul 30, 2024
926ef7a
refactor: use correct working directory for steps
mariajgrimaldi Jul 30, 2024
98a2821
refactor: comment further steps while testing
mariajgrimaldi Jul 31, 2024
39fd6a0
temp: use debug logs
mariajgrimaldi Jul 31, 2024
142d336
temp: use debug logs
mariajgrimaldi Jul 31, 2024
19bfd1b
refactor: use working directory with tvm environment
mariajgrimaldi Jul 31, 2024
934e261
refactor: uncomment further steps for testing
mariajgrimaldi Jul 31, 2024
acb78ae
temp: comment syntax validator for now
mariajgrimaldi Jul 31, 2024
90db1b4
refactor: use reusable action instead of composite
mariajgrimaldi Aug 1, 2024
e7054ad
refactor: move reusable action definition to workflows folder
mariajgrimaldi Aug 1, 2024
89726a6
fix: remove unnecessary description field
mariajgrimaldi Aug 1, 2024
35495fc
refactor: move workflow to build yml file
mariajgrimaldi Aug 1, 2024
1dddf54
refactor: use environment variable instead of turning tvm on each time
mariajgrimaldi Aug 1, 2024
05e8b01
feat: add missing push step
mariajgrimaldi Aug 1, 2024
6e56a6e
Revert "refactor: use environment variable instead of turning tvm on …
mariajgrimaldi Aug 1, 2024
55095fd
docs: improve documentation for strains repository input
mariajgrimaldi Aug 1, 2024
3c70de2
fix: activate tvm before pushing image
mariajgrimaldi Aug 1, 2024
13da45d
temp: try build without virtual env
mariajgrimaldi Aug 1, 2024
b4c9f68
refactor: remote tutor plugins update and list from step
mariajgrimaldi Aug 1, 2024
7c370f9
refactor: update checkout action to latest release
mariajgrimaldi Aug 1, 2024
23da84a
refactor: remove debug tvm command
mariajgrimaldi Aug 1, 2024
7cbd2e6
refactor!: drop creating virtual env
mariajgrimaldi Aug 1, 2024
a43ca13
feat: use service account ssh key instead
mariajgrimaldi Aug 7, 2024
abf7ca7
feat: setup ssh key for private repo cloning
mariajgrimaldi Aug 20, 2024
5765f2b
fix: add github to known hosts
mariajgrimaldi Aug 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 171 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
name: Picasso V2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[comment to reviewers]: this is a private repository, so only repositories from the edunext organization can use the workflow, see this documentation for more details. According to our previous discussions, the plan for this workflow is also to be used by other organizations. Which means this repository should be public.

Is that feasible for this repository? Considering that the Jenkins pipeline implementation is private as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, I don't see any major issues with making this repo public. I would like to hear from those who worked on the Jenkins implementation to see if this was previously discussed and to understand if there was any specific reason for keeping the implementation private. It might be simply that they didn't see a need for it to be public since Jenkins allows access to the service for as many people as needed without requiring access to the script repository (?)

Copy link
Contributor Author

@mariajgrimaldi mariajgrimaldi Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I should've added this context to my previous comment but I forgot. I created the dedalo-scripts repository and also committed the first changes. I didn't know at the time whether it would be a long-term solution or what implementations would be hosted there, so I used private as the default visibility setting. Now, I'd argue against this decision. The dedalo-scripts repository looks like a team-specific solution, but it shouldn't be, mainly because it's solving the tech-wide problem of how to build Open edX images.

Knowing this, this repo should be public regardless of the decisions made about the visibility of dedalo-scripts.


on:
workflow_call:
inputs:
STRAIN_REPOSITORY:
description: 'The repository for strains to checkout. It should follow the format: ORGANIZATION/REPO'
required: true
type: string
STRAIN_REPOSITORY_BRANCH:
description: 'The branch of the repository to checkout'
required: true
type: string
STRAIN_PATH:
description: 'The path within the repository for strains'
required: true
type: string
SERVICE:
description: 'The service name to build'
required: true
type: string
secrets:
DOCKERHUB_USERNAME:
description: 'DockerHub username for login'
required: true
DOCKERHUB_PASSWORD:
description: 'DockerHub password for login'
required: true
SSH_PRIVATE_KEY:
description: 'Service user SSH key for repository checkout'
required: true

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}

- name: Checkout strains repository
uses: actions/checkout@v4
with:
repository: ${{ inputs.STRAIN_REPOSITORY }}
ref: ${{ inputs.STRAIN_REPOSITORY_BRANCH }}
ssh-key: ${{ secrets.SSH_PRIVATE_KEY }}

- name: Set TVM project name and version
shell: bash
working-directory: ${{ inputs.STRAIN_PATH }}
run: |
if [ ! -e "./config.yml" ]; then
echo "ERROR: file config.yml doesn't exist"
exit 1 # terminate and indicate error
fi

# Initialize variables
strain_name=""
strain_tutor_version=""

# Read config.yml line by line
while IFS= read -r line || [[ -n $line ]]; do
if [[ "$line" == *":"* ]]; then
name=$(echo "$line" | cut -d ":" -f1)
value=$(echo "$line" | cut -d ":" -f2- | awk '{$1=$1};1' | tr -d '\r')

case $name in
*"TUTOR_APP_NAME"*)
strain_name=$value
;;
*"TUTOR_VERSION"*)
strain_tutor_version=$value
;;
esac
fi
done < "config.yml"

# Check if variables are set
if [ -z "$strain_name" ]; then
echo "ERROR: TUTOR_APP_NAME not found in the given config.yml" >&2
exit 1
fi

if [ -z "$strain_tutor_version" ]; then
echo "ERROR: TUTOR_VERSION not found in the given config.yml" >&2
exit 1
fi

echo "TUTOR_APP_NAME=${strain_name}" >> $GITHUB_ENV
echo "TUTOR_VERSION=${strain_tutor_version}" >> $GITHUB_ENV

- name: Install TVM
shell: bash
working-directory: ${{ inputs.STRAIN_PATH }}
run: |
pip install git+https://github.com/eduNEXT/tvm.git

- name: Configure TVM project
shell: bash
working-directory: ${{ inputs.STRAIN_PATH }}
run: |
tvm install $TUTOR_VERSION
tvm project init $TUTOR_APP_NAME $TUTOR_VERSION

# This command copies all the files and folders
# from the repository STRAIN_PATH to the recently above created TVM PROJECT folder
cp -r $(ls --ignore=$TUTOR_APP_NAME) $TUTOR_APP_NAME/

- name: Enable distro plugin
shell: bash
working-directory: ${{ inputs.STRAIN_PATH }}/${{ env.TUTOR_APP_NAME }}
run: |
. .tvm/bin/activate

major_version=$(pip show tutor | grep Version | awk '{print $2}' | cut -d'.' -f1)
distro_version=$(
git ls-remote --tags https://github.com/eduNEXT/tutor-contrib-edunext-distro \
| grep -E "$major_version\\.[0-9]+\\.[0-9]+$" \
| tail -n 1 \
| awk -F'/' '{print $NF}' \
)

pip install git+https://github.com/eduNEXT/tutor-contrib-edunext-distro@$distro_version
tutor plugins enable distro
Comment on lines +115 to +127
Copy link
Contributor Author

@mariajgrimaldi mariajgrimaldi Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[comment to reviewers]: I removed these lines from the original script since we're not using the plugin index to install distro:

tutor plugins update
tutor plugins list

Please let me know if you think otherwise.


- name: Setup SSH agent for private repositories cloning
uses: webfactory/[email protected]
with:
ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}

- name: Add GitHub to known hosts
shell: bash
run: |
ssh-keyscan github.com >> ~/.ssh/known_hosts

- name: Execute extra commands
shell: bash
working-directory: ${{ inputs.STRAIN_PATH }}/${{ env.TUTOR_APP_NAME }}
run: |
. .tvm/bin/activate
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[comment to reviewers]: Which environment variables does this script set? I think we can store those in the Github Environment, so there's no need to activate the tvm environment before running each step.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we could find a workaround to avoid activating the environment in each step, but I think we can leave it as an improvement for another PR

tutor distro run-extra-commands

- name: Prepare docker if building MFE
if: ${{ inputs.SERVICE == 'mfe' }}
shell: bash
run: |
echo "[worker.oci]" > buildkit.toml
echo "max-parallelism = 2" >> buildkit.toml
docker buildx create --use --node=max2cpu --driver=docker-container --config=./buildkit.toml
Comment on lines +150 to +152
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note to author]: add a comment explaining why we're doing this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this connection error with the parallel build also occur in the Jenkins pipeline? Could this potentially slow down build times or impact us in other ways?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked internally about the Jenkins agent's configuration but am still waiting for an answer so I'll ask again. I'll post here once I know more.

And yes, limiting parallelism will slow down the building process compared to the default buildkit configuration. Here's a successful build with this buildkit config, which takes about 20 minutes. I'll compare this time with our current building time for MFEs in Jenkins.

I was also thinking about taking this to the Tutor Users' Group meeting because it seems like it hasn't been addressed upstream yet.

Copy link

@magajh magajh Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks
Taking this to the Tutor users group seems like a good idea. And if it's something that can be resolved in the near future, and we can help push this forward, then I don't think it should be a major concern for us

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that this is not a blocker for the MVP. I opened this issue so we can work on it later: #2


- name: Build service image with no cache
shell: bash
working-directory: ${{ inputs.STRAIN_PATH }}/${{ env.TUTOR_APP_NAME }}
env:
SERVICE: ${{ inputs.SERVICE }}
run: |
. .tvm/bin/activate
tutor config save
tutor images build $SERVICE --no-cache
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note to author]: I don't think we need the --no-cache argument.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why you think we wouldn't need this? Especially considering that Picasso builds the images with the same --no-cache argument, and we want to compare the build times as fairly and accurately as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. Our current jenkins setup uses dynamically provisioned agents, each with a max number of concurrent builds. As I understand, --no-cache prevents one of those concurrent builds from using docker cache layers from another concurrent build. This wouldn't happen in this setup since no concurrent builds exist in the same GH runner. Also, each time a runner is provisioned, it's done from scratch without using cache of any form, including from previous builds. Therefore, this setup would behave the same with or without the --no-cache flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alec4r @MaferMazu, please correct me if I'm wrong. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can decide it after get some numbers and make some tests with the MVP, but I agree with you so far.


- name: Push service image to DockerHub
shell: bash
working-directory: ${{ inputs.STRAIN_PATH }}/${{ env.TUTOR_APP_NAME }}
env:
SERVICE: ${{ inputs.SERVICE }}
run: |
. .tvm/bin/activate
tutor images push $SERVICE