Skip to content

Commit

Permalink
Add note about skipping the entire flow via Variable
Browse files Browse the repository at this point in the history
  • Loading branch information
AetherUnbound committed Apr 22, 2023
1 parent da19f30 commit 22847a1
Showing 1 changed file with 20 additions and 13 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,22 @@ The final product of this plan will be a DAG (scheduled for `@monthly`) which
will recreate the staging database from the most recent snapshot of the
production API database. This will be accomplished by:

1. Determine the most recent automated **production** snapshot using boto3's
1. Use a new `SKIP_STAGING_DATABASE_RESTORE`
[Airflow Variable](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/variables.html)
to control whether the DAG should run or not. This will allow us to skip the
DAG in the case where we have made diverging chances to the staging database
for testing a new feature that we do not want to be overwritten. If this
Variable is set to `True`, the DAG should issue a Slack message notifying the
maintainers of the skipped run and raise an `AirflowSkipException`.
2. Determine the most recent automated **production** snapshot using boto3's
[`describe_db_snapshots`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/rds/client/describe_db_snapshots.html)
function (an Airflow operator does not exist for this operation). This step
should also check the `Status` value of the response for the most recent
snapshot to ensure that it is `available`.
2. In the case where the snapshot is not yet available, wait for the status of
3. In the case where the snapshot is not yet available, wait for the status of
the most recent snapshot using the
[`RdsSnapshotExistenceSensor`](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/sensors/rds/index.html#airflow.providers.amazon.aws.sensors.rds.RdsSnapshotExistenceSensor).
3. In parallel, gather the attributes of the **staging** database using boto3's
4. In parallel, gather the attributes of the **staging** database using boto3's
[`describe_db_instances`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/rds/client/describe_db_instances.html)
function (an Airflow operator does not exist for this operation). Namely, the
following attributes should be gathered and set to ensure they match:
Expand All @@ -56,34 +63,34 @@ production API database. This will be accomplished by:
- It should use password authentication (this may not need to be changed from
the default, as the snapshot should contain the same connection
information)
4. Create a new database from this snapshot using boto3's
5. Create a new database from this snapshot using boto3's
[`restore_db_instance_from_db_snapshot`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/rds/client/restore_db_instance_from_db_snapshot.html)
function (an operator does not exist for this operation). This database will
be named in a way that does not conflict with the existing staging database
name, e.g. `dev-next-openverse-db`. The database configuration information
from the previous step will be used to ensure the new database matches the
old database's configuration exactly.
5. Wait for the new database to be ready using the
6. Wait for the new database to be ready using the
[`RdsDbSensor`](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/sensors/rds/index.html#airflow.providers.amazon.aws.sensors.rds.RdsDbSensor)
([example](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/operators/rds.html#howto-sensor-rdsdbsensor)).
6. Rename the old staging database to `dev-old-openverse-db` using boto3's
7. Rename the old staging database to `dev-old-openverse-db` using boto3's
[`modify_db_instance`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/rds/client/modify_db_instance.html)
function (an operator does not exist for this operation). We must set the
`ApplyImmediately` option here to `True` to ensure this operation happens
immediately rather than waiting for the next maintenance window.
7. Wait for the old database rename to be complete with the `RdsDbSensor` (we
8. Wait for the old database rename to be complete with the `RdsDbSensor` (we
may need retries on this step, since the database may not be initially
available/named when the sensor first starts). _**Note**: this will cause a
temporary outage of the staging API, see
[the alternatives section](#alternatives) for why this is the case._ A Slack
notification should be sent out at the start of this step to alert the team
of the outage, and again once the outage is resolved.
8. Rename the new database to `dev-openverse-db` using `modify_db_instance`.
9. Rename the new database to `dev-openverse-db` using `modify_db_instance`.
(Noting that `ApplyImmediately` should be set to `True` here as well.)
9. Wait for the new database rename to be complete with the `RdsDbSensor` (we
may need retries on this step, since the database may not be initially
available/named when the sensor first starts).
10. If the previous steps fail, rename `dev-old-openverse-db` back to
10. Wait for the new database rename to be complete with the `RdsDbSensor` (we
may need retries on this step, since the database may not be initially
available/named when the sensor first starts).
11. If the previous steps fail, rename `dev-old-openverse-db` back to
`dev-openverse-db`. Otherwise, `dev-old-openverse-db` can be deleted using
the
[`RdsDeleteDbInstanceOperator`](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/operators/rds/index.html#airflow.providers.amazon.aws.operators.rds.RdsDeleteDbInstanceOperator).
Expand All @@ -92,7 +99,7 @@ production API database. This will be accomplished by:
until the database removal is complete.
- `SkipFinalSnapshot` should be set to `True` to avoid creating a final
snapshot of the database before deletion.
11. Report the success or failure of the DAG run to Slack.
12. Report the success or failure of the DAG run to Slack.

## Dependencies

Expand Down

0 comments on commit 22847a1

Please sign in to comment.