Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add envfile rotation DAG #4954

Merged
merged 2 commits into from
Oct 14, 2024
Merged

Add envfile rotation DAG #4954

merged 2 commits into from
Oct 14, 2024

Conversation

sarayourfriend
Copy link
Collaborator

@sarayourfriend sarayourfriend commented Sep 18, 2024

Fixes

Fixes https://github.com/WordPress/openverse-infrastructure/issues/968 by @sarayourfriend

Description

Adds a DAG to rotate the envfiles in the environment files buckets. The DAG finds the 3 most recent envfiles for each service, and then deletes everything else in the buckets. It will only do a dry-run unless the ENABLE_S3_ENVFILE_DELETION Airflow variable is set to True. Otherwise, it just detects and lists the files it will delete.

Testing Instructions

The DAG optionally uses a separate AWS connection ID to facilitate testing.

To set up for testing, set up an AWS connection with the ability to describe launch templates, describe launch template versions, describe task definitions, and list bucket objects. (I will share specific IAM permissions later to help reviewers get this, if they need it). Set that to a new AWS connection (call it whatever you want, I used aws_s3_envfile_rotation), and then set the name of that connection to a new Airflow variable S3_ENVFILE_ROTATION_AWS_CONN_ID. Do not set ENABLE_S3_ENVFILE_DELETION when you are testing as you just want a dry run, which is the default behaviour.

Then enable the DAG and let it run (visit the DAG's page here: http://localhost:9090/dags/rotate_envfiles/grid?search=rotate_envfiles). You should see it log all the currently-in-use environment files in the two tasks to detect them from launch templates and task definitions (one task for each). In the logs for identify_stale_envfiles, you will see it find some production envfiles that are in the format {service}/{container}/.env, that is, specifically without the hash in the final key portion. I created these when I was working on the feature, before I decided to use the hashed file names. They are safe to delete, and will be the first ones that get cleaned up by the live DAG when we enable it after deploying it. The DAG considers these stale because there are no launch templates or task definitions referencing them. Neat!

Critically, you should see none of the in-use environment files considered stale by identify_stale_envfiles. The delete_stale_envfiles will log all the files it would have deleted, had this not been a dry run. Here you can again confirm that you only see it list those that I mentioned above without the hash in the key.

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • I ran the DAG documentation generator (ov just catalog/generate-docs for catalog
    PRs) or the media properties generator (ov just catalog/generate-docs media-props
    for the catalog or ov just api/generate-docs for the API) where applicable.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@AetherUnbound
Copy link
Collaborator

Here are my suggestions for testing:

  • Use a connection ID which is specific to this DAG, but defaults to AWS_CONN_ID. This allows the conn ID to be defined locally in your .env file and used exclusively for the actions of this DAG, while all using the same conn ID in production. An example for this is the AWS_RDS_CONN_ID which is used in the staging database restore DAG.
  • If you want to mock interactions with S3 specifically, you can use the minio instance that gets spun up with the catalog for preloading and operating on S3 file. The add Rekognition labels is an example of this (files in s3-data folder, like the Rekognition one, get loaded in when the stack is first created).
  • Beyond those needs, I would simply mock the responses from the hook functions. We do this in numerous places throughout the codebase: 1, 2, 3, etc.

@sarayourfriend
Copy link
Collaborator Author

Okay, thanks! I'll go with the specific-connection approach first, I do remember now using that to test the RDS snapshot rotation DAG before.

@sarayourfriend sarayourfriend added 🟩 priority: low Low priority and doesn't need to be rushed 🌟 goal: addition Addition of new feature 💻 aspect: code Concerns the software code in the repository 🧱 stack: infra Related to the Terraform config and other infrastructure labels Sep 27, 2024
@sarayourfriend sarayourfriend marked this pull request as ready for review September 27, 2024 03:04
@sarayourfriend sarayourfriend requested review from a team as code owners September 27, 2024 03:04
@sarayourfriend sarayourfriend requested review from AetherUnbound, dhruvkb and krysal and removed request for a team September 27, 2024 03:04
@sarayourfriend sarayourfriend added the 🧱 stack: catalog Related to the catalog and Airflow DAGs label Oct 9, 2024
Copy link

github-actions bot commented Oct 9, 2024

Full-stack documentation: https://docs.openverse.org/_preview/4954

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

Changed files 🔄:

@openverse-bot
Copy link
Collaborator

Based on the contributor urgency of this PR, the following reviewers are being gently reminded to review this PR:

@AetherUnbound
@dhruvkb
@krysal
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was ready for review 9 day(s) ago. PRs labelled with contributor urgency are expected to be reviewed within 3 weekday(s)2.

@sarayourfriend, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it and worked as mentioned and expected. Great!

I was wondering how you set up the AWS connection. I used my own credentials, but I think there should be an alternative with more specific permissions and limitations, as you describe.

catalog/dags/maintenance/rotate_envfiles.py Outdated Show resolved Hide resolved
catalog/dags/maintenance/rotate_envfiles.py Outdated Show resolved Hide resolved
catalog/dags/maintenance/rotate_envfiles.py Outdated Show resolved Hide resolved
catalog/dags/maintenance/rotate_envfiles.py Outdated Show resolved Hide resolved
catalog/dags/maintenance/rotate_envfiles.py Outdated Show resolved Hide resolved
Co-authored-by: Krystle Salazar <[email protected]>
@sarayourfriend
Copy link
Collaborator Author

sarayourfriend commented Oct 13, 2024

I used my regular credentials as well. By the way, @krysal I'm unable to merge this PR due to permissions, it seems, so I'll have to ask you to do it. After merging, you'll need to enable the DAG in Airflow. However, before enabling it, you'll need to add IAM permissions for the environments buckets to the Airflow execution role in next/production/airflow.tf, otherwise it will fail to read the bucket objects. I will (done) I have DM'd you the diff in Make Slack that I think should work.

I suggest letting it do a "dry run" first, by leaving ENABLE_S3_ENVFILE_DELETION undefined, and confirming it logs everything and doesn't have permission issues. Afterwards, add the ENABLE_S3_ENVFILE_DELETION to Airflow variables set to 1 and it should be good to go 🙂

@krysal krysal merged commit f164ad7 into main Oct 14, 2024
44 checks passed
@krysal krysal deleted the add/envfile-rotation-dag branch October 14, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: infra Related to the Terraform config and other infrastructure
Projects
Status: 🤝 Merged
Development

Successfully merging this pull request may close these issues.

4 participants