Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop ORDER BY clause from copy step of image data refresh when adding a limit #4381

Closed
krysal opened this issue May 24, 2024 · 0 comments · Fixed by #4382
Closed

Drop ORDER BY clause from copy step of image data refresh when adding a limit #4381

krysal opened this issue May 24, 2024 · 0 comments · Fixed by #4382
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: ingestion server Related to the ingestion/data refresh server

Comments

@krysal
Copy link
Member

krysal commented May 24, 2024

Description

The recent image data refresh in the dev environment failed the copy step. We realized that turning on a limit to the number of rows copied (done in WordPress/openverse-infrastructure#908) was also applying an ordering clause, which is prohibitive for a table with so many rows (+700 million).

We still want a subset of the production data in dev and really don't need it to be pseudo-random so we can drop this piece of the clause generation and just let it select a limit:

# The audioset view does not have identifiers associated with it
if upstream_table != "audioset_view":
select_insert += d(
"""
ORDER BY identifier"""
)

Additional context

Related to #736, WordPress/openverse-api#474 (original PR adding the clause) and #3912 (because it's necessary for testing in staging).

@krysal krysal added 🟧 priority: high Stalls work on the project or its dependents 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository 🧱 stack: ingestion server Related to the ingestion/data refresh server labels May 24, 2024
@krysal krysal self-assigned this May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: ingestion server Related to the ingestion/data refresh server
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant