-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation Plan: Update staging database #1154
Implementation Plan: Update staging database #1154
Conversation
rfcs/search_relevancy_sandbox/20230406-implementation_plan_update_staging_database.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an assigned reviewer but had some small notes for things that could be clarified to avoid easy mistakes during implementation.
Also: the implementation plan guide requests identifying the atomic blocks of work that can be split into individual PRs. This plan looks like it would probably go into a single PR, despite its complexity. Does that sound right? If so, can it be noted explicitly?
rfcs/search_relevancy_sandbox/20230406-implementation_plan_update_staging_database.md
Outdated
Show resolved
Hide resolved
rfcs/search_relevancy_sandbox/20230406-implementation_plan_update_staging_database.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, very clear! And nice to see how many existing operators we'll be able to take advantage of.
rfcs/search_relevancy_sandbox/20230406-implementation_plan_update_staging_database.md
Outdated
Show resolved
Hide resolved
rfcs/search_relevancy_sandbox/20230406-implementation_plan_update_staging_database.md
Outdated
Show resolved
Hide resolved
the | ||
[`RdsDeleteDbInstanceOperator`](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/operators/rds/index.html#airflow.providers.amazon.aws.operators.rds.RdsDeleteDbInstanceOperator) | ||
(the `wait_for_completion` should be left as `True` here so we don't need a | ||
follow-up sensor). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the last couple steps are:
- Trigger renaming new database
- Wait for the renaming to complete (either success or fail)
- Branch on the status of the renaming:
- If successful, delete the old db
- If failed, rename the old db back to
dev-openverse-db
I don't understand why those last operations don't require a follow-up sensor to wait for their completion 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my wording may be ambiguous here - the wait_for_completion
step means that we avoid the need for a sensor on the delete operation itself. The wait_for_completion
parameter is a feature on the Airflow operators themselves, unfortunately anything we do with the RDS client directly will require a follow-up sensor since I don't think there's a similar mechanism available with the boto3
API alone. I'll reword this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH got it, that did totally confuse me -- I was thinking of wait_for_completion
as the task id for a sensor. Got it!
configuration. | ||
|
||
With this in mind, it seems much easier to handle the outage for staging rather | ||
than try and avoid it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 +1 for this reasoning
accomplish this would be to provide a wrapper around all the functions we will | ||
need for interacting with `boto3` that checks that `prod` is not in any of the | ||
references made to database assets (save for the initial snapshot acquisition | ||
step). If `prod` is found in any of the references, the DAG should fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool idea!
|
||
The final product of this plan will be a DAG (scheduled for `@monthly`) which | ||
will recreate the staging database from the most recent snapshot of the | ||
production API database. This will be accomplished by: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that the staging DB was meant to be a subset of the production data, although using the full snapshot would of course be great. I think I may have confused the api DB with the elasticsearch index, which will be a subset. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct! The database itself will have all data, but some of the indices we use may have a subset of that data (e.g. the proportional-by-provider index described in #1107). We have also made the staging database a subset of the production database in the past, but I don't think that's necessary to do going forward.
Thank you both for your feedback & questions! I will be answering the questions and revising the document over the next few days. |
d044dfd
to
7edf9e8
Compare
9cebcbe
to
cf5510f
Compare
Full-stack documentation: https://docs.openverse.org/_preview/1154 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. |
Yes, that's correct - my thought was that this would be a single DAG. I'll make this explicit. |
cf5510f
to
23a00bb
Compare
@sarayourfriend and @stacimc - I believe I've applied all of the feedback & clarification from your notes, please take another look when you have a moment! |
<!-- Enumerate any references to other documents/pages, including milestones and other plans --> | ||
|
||
- [Project Thread](https://github.com/WordPress/openverse/issues/392) | ||
- [Project Proposal]() _TBD_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [Project Proposal]() _TBD_ | |
- [Project Proposal](https://github.com/WordPress/openverse/pull/1107) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, just jumping in here to share (and elaborate on) an alternative approach I mentioned in the project proposal comments. This isn't meant to discredit the work here or suggest that it isn't a suitable solution. In considering this alternative solution though, I feel that it avoids some of the pitfalls of the proposed solution, providing that I understand the motivation of this project correctly. If the staging database is just a copy of the production database, why do we even need a staging database? I came up with the following reasons, trying to ignore anything that wasn't DB specific:
These to me suggest that we'd want the Django-specific tables to be persistent, and not wiped out weekly by restoring from production backup. What we really want is to only sync the media tables. Additionally, I believe access to these media tables only needs to be read-only, as API consumers and the Django admin dashboard do not actually make changes to the media tables. I can think of three alternative approaches to reading the media from the production database in the staging database. These approaches have some benefits to the proposal here, along with their own tradeoffs:
I'm curious what you think about these ideas! I only fell into this rabbit hole while reviewing the project proposal but I'm excited to look at this problem space in a way I hadn't before. |
119e297
to
82e8e14
Compare
I believe I have addressed all the feedback from folks! I'm going to be moving this back into the Decision Round, but since I'm going to be gone most of this week and we have a few folks AFK, we'll extend the timeline for response - please respond to this by May 3rd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! The updates look to capture the feedback well, and thanks for creating #1874. Thanks for the very interesting discussion on this one.
[`get_current_index` task](https://github.com/WordPress/openverse/blob/aedc9c16ce5ed11709e2b6f0b42bad77ea1eb19b/catalog/dags/data_refresh/data_refresh_task_factory.py#L167-L175) | ||
on the data refresh DAG) | ||
2. Initiate the | ||
[`UPDATE_INDEX` action](https://github.com/WordPress/openverse/blob/7427bbd4a8178d05a27e6fef07d70905ec7ef16b/ingestion_server/ingestion_server/api.py#L107-L113) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool to be able to use this 👍
...s/proposals/search_relevancy_sandbox/20230406-implementation_plan_update_staging_database.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is perfectly reasonable and appropiate in my view! Thanks for writing it and add the suggestions @AetherUnbound.
I would also like folks who intend to give feedback to explicitly say whether they're okay with data created in staging being destroyed on a regular basis, even with the option to prevent it at any time being available.
To me staging was always a laboratory to try changes and never saw it as a place where data persisted, so I'm okay with it being wiped out on the workflow described here. Having data persisted on stagin would be a new requirement but I still don't see it necessary.
1. The staging API must be deployed to apply any necessary migrations to the | ||
new copy the database. In order to do this we must augment the existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the requirement is to only run migrations it would be easier and faster just to run the django command python manage.py migrate
in sstaging instance. Are there additional reasons that warrant a deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To run a command in staging we'd need to enable AWS SSM on the AWS client installed in Airflow. ECS Exec only works if SSM is enabled on the client. I don't know how much work it would be to set that up for the catalog. Staging API deployments take <=3 minutes. My 2 cents is that it probably isn't worth the hassle for this particular use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea agreed - I think a deployment would be the easiest thing to kick off given what we have.
...s/proposals/search_relevancy_sandbox/20230406-implementation_plan_update_staging_database.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Zack Krida <[email protected]>
Co-authored-by: Krystle Salazar <[email protected]>
Co-authored-by: Staci Mullins <[email protected]> Co-authored-by: Krystle Salazar <[email protected]>
c7986e1
to
7ac9ad0
Compare
Coming back to this, I actually don't think that performing an openverse/ingestion_server/ingestion_server/indexer.py Lines 325 to 338 in ee90144
After #1987 is implemented we could add steps to initiate a full refresh of the indices that IP describes once the restore is complete. If we wanted to use |
Due date:
2023-05-03
Assigned reviewers
Description
This is an implementation plan for the first portion of #1107.
Current round
This discussion is following the Openverse decision-making process. Information about this process can be found on the Openverse documentation site. Requested reviewers or participants will be following this process. If you are being asked to give input on a specific detail, you do not need to familiarise yourself with the process and follow it.
This discussion is currently in the Decision round.
The deadline for review for this round is May 3rd