Add filtered index and promotion steps to data refresh #4833

stacimc · 2024-08-29T01:38:00Z

Fixes

Description

This PR adds the final steps to the data refresh DAG:

create and promote the filtered index
remap indices from the media table to the temp table
remap constraints from the media table to the temp table
promote the temp table & delete the original
promote the 'main' and filtered indices

This was much less straightforward than I imagined and this PR is fairly large as a consequence (although there are tests, and an es_mapping.py configuration file that's just copied from the ingestion server adding to the line count). I will attempt to break it into smaller pieces if reviewers like; however I'll point out that it's much easier to manually test in its current state, because you can actually run the entire data refresh (even multiple times in a row).

While working on this PR I also noticed several small issues that I fixed at the same time. I have left comments at the relevant code changes to make these easier to spot:

I changed the create_index steps to create the new index according to a hard-coded index config, rather than basing it off the config of the current index (see comment inline for reasoning)
I fixed a bug where index calculations were off when multiple data refreshes were run concurrently
I fixed several bugs with error handling and dependency chains, where the failure of a task would not cause presumed downstream tasks to skip. See commit messages
I fixed a bug in the ingestion server with the way constraints are mapped. See commit message

I've tried to make commit messages and comments in the code as descriptive as possible, but please ping me if clarification is needed anywhere.

Testing Instructions

Add the following to your catalog/.env:

AIRFLOW_CONN_SENSITIVE_TERMS="http://catalog_indexer_worker:8003/static/mock_sensitive_terms.txt"

Run ov just recreate to start with fresh sample data.

Run an image and an audio DAG for a few minutes to get some new records. Wait until you see that a couple 100 lines have been written in the pull_data logs, then mark the task as a success. I used Cleveland and Jamendo.

Establish a baseline

Go to https://app.elasticvue.com/cluster/0/indices and verify you see the four initial indices (image-init, image-init-filtered, audio-init, audio-init-filtered), and note the number of records in each. Now run ov just api/pgcli and run describe image; and describe audio;. Save the results somewhere.

Run the new data refreshes and compare

Now enable the staging_audio_data_refresh and staging_image_data_refresh DAGs. They should both complete successfully. Once done, check out elasticvue again. You should see that the -init indices have all been dropped, and replace with new indices with a uuid (example audio-f61ab0e757d244ab9abeca0a0613d9e6, audio-f61ab0e757d244ab9abeca0a0613d9e6-filtered, and so on). You should have more records in each of these new indices than in the initial ones. To be extra safe, you can go check the logs of the provider DAGs you ran to see how many records were ingested; that's the number of extra records you should see in the non-filtered indices.

In pgcli, again run describe image; and describe audio;. Compare the results to what you saw earlier; pay particular attention to the indices and constraints. They should be identical, including the index names.

Checklist

My pull request has a descriptive title (not a vague title likeUpdate index.md).
My pull request targets the default branch of the repository (main) or a parent feature branch.
My commit messages follow best practices.
My code follows the established code style of the repository.
I added or updated tests for the changes I made (if applicable).
I added or updated documentation (if applicable).
I tried running the project locally and verified that there are no visible errors.
I ran the DAG documentation generator (ov just catalog/generate-docs for catalog
PRs) or the media properties generator (ov just catalog/generate-docs media-props
for the catalog or ov just api/generate-docs for the API) where applicable.

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

stacimc · 2024-08-30T18:54:55Z

catalog/dags/data_refresh/create_and_populate_filtered_index.py

+as part of the Data Refresh.
+
+TODO: We'll swap out the create and populate filtered index DAG to use this instead
+of the one defined in the legacy_data_refresh.


Noting that there is a separate issue for this, (and moreover the filtered index is going away).

stacimc · 2024-08-30T21:47:31Z

catalog/dags/data_refresh/create_index.py

+    es.create_index(
+        index_config={
+            "index": temp_index_name,
+            "body": index_settings(data_refresh_config.media_type),


This change ensures that the data refresh always creates a new index based off the hard-coded index configuration in es_mapping.py, rather than by copying the configuration of the existing index.

This is how the ingestion server does it currently, and was just a miss in the previous PR. It's necessary to ensure that temporary or experimental changes to the indices (for example, created using our create_new_es_index DAGs or similar) are not carried forward in the data refresh without making the changes permanent via a code change/PR.

stacimc · 2024-08-30T21:54:37Z

catalog/entrypoint.sh

-  # only include Slack airflow connections
-done < <(env | grep "^AIRFLOW_CONN_SLACK*")
+  # only include Slack airflow connections and the Sensitive Terms connection
+done < <(env | grep "^AIRFLOW_CONN_SLACK*\|AIRFLOW_CONN_SENSITIVE_TERMS")


This change is necessary to allow testing with the real sensitive terms list when desired, since it uses https.

stacimc · 2024-08-30T21:57:35Z

indexer_worker/indexer_worker/api.py

@@ -116,6 +117,7 @@ def create_api():
    _api.add_route("/healthcheck", HealthcheckResource())
    _api.add_route("/task", IndexingJobResource(task_tracker))
    _api.add_route("/task/{task_id}", TaskStatusResource(task_tracker))
+    _api.add_static_route("/static", (Path(".") / "static").absolute())


The indexer worker serves the mock sensitive terms list, similar to how the ingestion server currently does so. This is okay because in local development, the indexer worker is always running, and in production we use the real sensitive terms list rather than the mock one. It is helpful to have the mock list served this way to make the local filtered_index steps behave as similarly as possible to the way it does in production, just using a different endpoint.

If we wanted to do it a different way (to avoid adding this functionality to the indexer worker), we could put the file into minio, and still retrieve it over HTTP. According to this guide about using minio to host a website it shouldn't be too difficult to do later on.

For the sake of simplicity, though, would it be better if the DAG just swapped the location it reads the sensitive terms from based on an additional environment variable? Like:

if USE_MOCK_SENSITIVE_TERMS: return ["running", "water", "bird"] else: # retrieve the regular list

I know it's not as seamless as just swapping out the HTTP connection string in the environment file, but it avoids a bit of this hoop-jumping required to make the DAG code "seamless". I guess really we've just hidden the seam somewhere else by doing it this way 😛 (Also: because I wrote the original code for this in the ingestion server, I can say I think it was a bit cheeky/too clever to do it this way and wish I'd done this if/else there to begin with!)

We couldn't use the HttpOperator if we wanted to do that, or else we'd have to add branching tasks to get around selecting the right task per environment. Not a big deal but I'd prefer to keep the operators clean. We could use minio, something similar to what @AetherUnbound did in #4836 maybe 🤔

We couldn't use the HttpOperator if we wanted to do that, or else we'd have to add branching tasks to get around selecting the right task per environment

Oh 😕 We can't call the HttpOperator conditionally in a Python @task operator? I figured something like this would be possible:

@task def get_sensitive_terms(**context): if USE_MOCK_SENSITIVE_TERMS: return ["running", "water", "bird"] else: http_op = HttpOperator(...) return http_op.execute(**context)

We might be able to initialize the operator inside a @task, but it's probably easier to use the HttpHook inside a @task than have an operator-in-operator situation by using HttpOperator.

I think I'm fine going with the new static route approach suggested here, but @stacimc do you mind adding a comment about what the route is for, and that it's only intended to be used for local testing?

Ah, maybe I'm thinking of the hook.

All good, I was just suggesting alternatives if wanted, this approach obviously has worked fine for us 🙂

It will go away in #3336 anyway, which I didn't know when I wrote my original comment or I would have mentioned it!

stacimc · 2024-08-31T00:00:36Z

@sarayourfriend I added you as an additional reviewer because you expressed interest in this PR (and your feedback is always appreciated 😄) but don't feel obligated!

obulat · 2024-08-31T11:29:08Z

catalog/dags/data_refresh/promote_table.py

This is such a great example of the fact that the productivity should not be measured by the line count :) A couple of lines changed needed so much debugging and explanation. Thank you for great detailed notes, as always. Great to have the local environment close to the production.

sarayourfriend

Far out! I haven't tested this locally, just read over the code and process. Very cool! It's so exciting to have this laid out in the graph view, it really helps show the complexity of the data refresh, while also giving a clearer narrative to the different steps. I'm really glad we went forward with this project to remove the ingestion server.

If you'd like me to test this out locally, I'm happy to do so as well, just let me know. Otherwise, I'll leave it to the catalogue reviewers to do that part of it. For my part, this is very cool and exciting.

sarayourfriend · 2024-09-03T03:43:44Z

catalog/env.template

+# Connect to sensitive terms list used for filtered index creation in the data refresh.
+# For local development, the mock terms list is served by the catalog indexer worker. Change to the
+# full production URL in order to use the production sensitive terms list.
+AIRFLOW_CONN_SENSITIVE_TERMS="http://catalog_indexer_worker:8003/static/mock_sensitive_terms.txt"


@stacimc will we need an infrastructure change to point this to the correct live location?

It's not required, since we can set this through the Admin UI (as we have with several other connections/variables). We could make an infrastructure change to set it as an environment variable though; actually, I'm not clear how we've drawn the line between which variables should be configured where 🤔 What do you think?

Using the Admin UI sounds good, I believe that's in line with decisions we've made in the past (https://github.com/WordPress/openverse-infrastructure/issues/215). I forgot that environment variables aren't necessary for configuring these. I realised that I didn't personally have a clear picture of environment variables vs Airflow variables/connections, and that some of our code muddies that distinction... and presents problems for advancements in our Airflow configuration, management, and execution. I've opened #4863 to address those inconsistencies.

I've started a discussion here to work towards an approach more in line with the rest of our applications (using infrastructure as code).

I think this is fine for now! One disadvantage of declaring this in the Admin UI is that the password (if needed) for the services isn't codified in the same way in our infrastructure repo. Noting that for now, I'll add more thorough thoughts to that discussion when I get a chance to look at that discussion.

catalog/dags/data_refresh/distributed_reindex.py

sarayourfriend · 2024-09-03T04:17:31Z

catalog/dags/data_refresh/remap_table_indices.py

+logger = logging.getLogger(__name__)
+
+
+class TableIndex(NamedTuple):


Question: is the usage of NamedTuple instead of dataclasses due to a limitation in Airflow? I seem to recall limitations in task return types. Otherwise, do you have a preference for NamedTuples over dataclasses? And if so, I'd be curious to know the reason 🙂

I believe anything json serializable can be passed through XCOMs, if I'm not mistaken a dataclass would work. No strong preference beyond following convention in the project, and given that we don't require anything more complex. I don't think the pros/cons of either option are particularly significant in this use case.

+1 to what Staci said, according to the documentation:

They can have any (serializable) value

I'm assuming "serializable" in this case means JSON serialization. I was able to return a NamedTuple in the add_rekognition_labels DAG and Airflow hydrated it back into the NamedTuple just fine. In practice (via the UI), it looks like it serializes it to a list:

Dataclasses, although they have more functionality, are a little heavier-weight. If we essentially just need an object to attach a bunch of data to and reference it more easily/programmatically/in a typed way than a dictionary, NamedTuple is preferred. That said, I did just find this documentation on dataclasses!! 🤩:

As mentioned TaskFlow uses XCom to pass variables to each task. This requires that variables that are used as arguments need to be able to be serialized. Airflow out of the box supports all built-in types (like int or str) and it supports objects that are decorated with @dataclass or @attr.define.

Update: I made an upstream PR to make that clearer here!
apache/airflow#42045

sarayourfriend · 2024-09-03T04:24:19Z

indexer_worker/indexer_worker/api.py

@@ -116,6 +117,7 @@ def create_api():
    _api.add_route("/healthcheck", HealthcheckResource())
    _api.add_route("/task", IndexingJobResource(task_tracker))
    _api.add_route("/task/{task_id}", TaskStatusResource(task_tracker))
+    _api.add_static_route("/static", (Path(".") / "static").absolute())


If we wanted to do it a different way (to avoid adding this functionality to the indexer worker), we could put the file into minio, and still retrieve it over HTTP. According to this guide about using minio to host a website it shouldn't be too difficult to do later on.

For the sake of simplicity, though, would it be better if the DAG just swapped the location it reads the sensitive terms from based on an additional environment variable? Like:

if USE_MOCK_SENSITIVE_TERMS: return ["running", "water", "bird"] else: # retrieve the regular list

I know it's not as seamless as just swapping out the HTTP connection string in the environment file, but it avoids a bit of this hoop-jumping required to make the DAG code "seamless". I guess really we've just hidden the seam somewhere else by doing it this way 😛 (Also: because I wrote the original code for this in the ingestion server, I can say I think it was a bit cheeky/too clever to do it this way and wish I'd done this if/else there to begin with!)

catalog/dags/data_refresh/create_and_populate_filtered_index.py

obulat · 2024-09-05T15:35:20Z

I ran the steps from the testing instructions.

The audio refresh ran first. The numbers in the ES (seen in ElasticVue) increased, but the get_after_record_count returned the old counts.
Then, the image refresh ran. The numbers in the ES did not change at all, even though the DAG seemed to have finished successfully

wait_for_reindexing_task finished successfully, but logs an error stating that you cannot pickle a module i in the end. This SO answer suggests that something is wrong with the kwargs object.

The error messages

[2024-09-05, 15:17:29 UTC] {taskinstance.py:3916} ERROR - Error scheduling downstream tasks. Skipping it as this is entirely optional optimisation. There might be various reasons for it, please take a look at the stack trace to figure out if the root cause can be diagnosed and fixed. See the issue https://github.com/apache/***/issues/39717 for details and an example problem. If you would like to get help in solving root cause, open discussion with all details with your managed service support or in Airflow repository.
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py", line 3912, in schedule_downstream_tasks
    return TaskInstance._schedule_downstream_tasks(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/session.py", line 94, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py", line 3861, in _schedule_downstream_tasks
    partial_dag = task.dag.partial_subset(
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/dag.py", line 2646, in partial_subset
    t.task_id: _deepcopy_task(t)
               ^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/dag.py", line 2643, in _deepcopy_task
    return copy.deepcopy(t, memo)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 143, in deepcopy
    y = copier(memo)
        ^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1388, in __deepcopy__
    setattr(result, k, copy.deepcopy(v, memo))
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 143, in deepcopy
    y = copier(memo)
        ^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1388, in __deepcopy__
    setattr(result, k, copy.deepcopy(v, memo))
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 143, in deepcopy
    y = copier(memo)
        ^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1388, in __deepcopy__
    setattr(result, k, copy.deepcopy(v, memo))
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 143, in deepcopy
    y = copier(memo)
        ^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1388, in __deepcopy__
    setattr(result, k, copy.deepcopy(v, memo))
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 143, in deepcopy
    y = copier(memo)
        ^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1388, in __deepcopy__
    setattr(result, k, copy.deepcopy(v, memo))
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 143, in deepcopy
    y = copier(memo)
        ^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1388, in __deepcopy__
    setattr(result, k, copy.deepcopy(v, memo))
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 143, in deepcopy
    y = copier(memo)
        ^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1388, in __deepcopy__
    setattr(result, k, copy.deepcopy(v, memo))
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 143, in deepcopy
    y = copier(memo)
        ^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1388, in __deepcopy__
    setattr(result, k, copy.deepcopy(v, memo))
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 162, in deepcopy
    y = _reconstruct(x, memo, *rv)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 259, in _reconstruct
    state = deepcopy(state, memo)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 136, in deepcopy
    y = copier(x, memo)
        ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 221, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/copy.py", line 151, in deepcopy
    rv = reductor(4)
         ^^^^^^^^^^^
TypeError: cannot pickle 'module' object

There are also two warnings in the beginning of this task's logs:

[2024-09-05, 15:17:29 UTC] {baseoperator.py:405} WARNING - TemplatedConnectionHttpSensor.execute cannot be called outside TaskInstance!

[2024-09-05, 15:17:29 UTC] {logging_mixin.py:190} WARNING - /home/***/.local/lib/python3.12/site-packages/***/task/task_runner/standard_task_runner.py:70 DeprecationWarning: This process (pid=27862) is multi-threaded, use of fork() may lead to deadlocks in the child.

…ting index The index configuration should be hard-coded here to match the behavior of the ingestion server. This is also important because we require permanent changes to the index configuration to go through the code review process. Other DAGs can be used to rapidly iterate on index configuration, and changes can be persisted here only if wanted.

This isn't possible because dynamically mapping tasks within a dynamically mapped taskgroup is not supported in our version of Airflow.

…emp table

There was an error in the task dependencies caused by the fact that we were returning the output of transform_index_defs from the remap_indices taskgroup. Normally, the last task in a TaskGroup is the task that is set upstream of any downstream dependencies (in this case, the remap_constraints group). Apparently if you add a return statement to a TaskGroup (which we were doing in order to make the index_mapping the XCOM result for the entire taskgroup), this borks the dependency chain. As a result, transform_index_defs was being set as the upstream task to get_all_existing_constraints. This meant that if there was an error in create_table_indices, it would NOT stop the remap_constraints tasks from running. By updating the create_table_indices task to return what we want from XCOMs as well, we can avoid this issue.

stacimc · 2024-09-05T22:41:19Z

Thanks @obulat! Good catch on the record count -- it was getting the counts for the old index rather than the promoted one, there's a quick fix which I'll push shortly!

I cannot reproduce your issues with the image data refresh, though I'll keep trying some more things. Are you positive that the provider DAG had succeeded and finished loading additional records into the catalog DB before the refresh started? 🤔

The error/warning you mentioned seemed like it's related to the way I've overridden templated fields in the Sensor, but the sensor is working so that's strange. I'll play around with it to see if I can get the warnings cleared.

AetherUnbound

This is wonderful! I'm sorry this last piece took more than expected but it's so nice to finally be there 🥳

I'm not quite done with my review yet, still about 10 more files to edit, but here's what I have so far.

One thing I wanted to note too - we send Slack notifications for nearly every step of the data refresh in its current implementation (that just all happens on the to-be-removed ingestion server). Maybe it might make sense to have an on_success_callback for the data refresh which does something similar? That way we don't have to add it to every step, but still get some reporting. Just something to think about, if you like the idea or didn't already have a plan for Slack updates, we can make an issue for it!

catalog/dags/common/sql.py

catalog/dags/data_refresh/distributed_reindex.py

catalog/dags/data_refresh/promote_table.py

AetherUnbound · 2024-09-05T22:36:38Z

catalog/dags/data_refresh/remap_table_indices.py

+logger = logging.getLogger(__name__)
+
+
+class TableIndex(NamedTuple):


+1 to what Staci said, according to the documentation:

They can have any (serializable) value

I'm assuming "serializable" in this case means JSON serialization. I was able to return a NamedTuple in the add_rekognition_labels DAG and Airflow hydrated it back into the NamedTuple just fine. In practice (via the UI), it looks like it serializes it to a list:

Dataclasses, although they have more functionality, are a little heavier-weight. If we essentially just need an object to attach a bunch of data to and reference it more easily/programmatically/in a typed way than a dictionary, NamedTuple is preferred. That said, I did just find this documentation on dataclasses!! 🤩:

As mentioned TaskFlow uses XCom to pass variables to each task. This requires that variables that are used as arguments need to be able to be serialized. Airflow out of the box supports all built-in types (like int or str) and it supports objects that are decorated with @dataclass or @attr.define.

catalog/dags/data_refresh/remap_table_constraints.py

AetherUnbound · 2024-09-05T22:51:55Z

catalog/dags/data_refresh/remap_table_constraints.py

+def fetch_all_tuples(cursor):
+    try:
+        rows = cursor.fetchall()
+        return [ConstraintInfo(row[0], row[1], row[2]) for row in rows]
+    except Exception as e:
+        raise ValueError("Unable to extract expected row data from cursor") from e


catalog/dags/data_refresh/remap_table_constraints.py

AetherUnbound · 2024-09-05T22:56:37Z

catalog/dags/data_refresh/remap_table_constraints.py

+    The constraint to remap may be a constraint directly on the live media table, in
+    which case it is dropped and then added directly to the temp table. Alternatively,


Just to be clear, does that mean the live tables won't have those constraints for some time, even while they're still live? That's probably fine, but if something fails part way through, could the constraints be missing on the live table the next time they're tried to be moved over?

Correct. This does mirror the current implementation, so it's not a regression and the (minor) improvement here is that we'd have immediate insight into where a failure happened/if those steps need to be manually undone.

I'll make an issue about this, because it's not simple to work around. We could do something similar to the way we remap the indices, where we give the new constraints a temporary name and then map the names back over in the promote step (if that's even necessary 🤔). But the delete_orphans step and remapping foreign key constraints to the temp table can't be easily postponed or undone.

I don't think we should be too worried about this because it's the very last step before promotion, anyway -- but definitely good to call out, and something we'd need to be aware of if for some reason we wanted to start the data refresh over from the beginning after a failure between this step and promotion.

Thanks for the explanation, I think that sounds good! Not blocking in this case because it mirrors the previous implementation.

openverse-bot · 2024-09-06T00:00:11Z

Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR:

@krysal
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend¹ days, this PR was ready for review 4 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)².

@stacimc, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Specifically, Saturday and Sunday. ↩
For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range. ↩

obulat

I re-ran the testing instructions, and the image refresh ran well and saved 10,000 images to the main image index.

catalog/dags/data_refresh/copy_data.py

catalog/dags/data_refresh/remap_table_indices.py

obulat · 2024-09-06T03:59:50Z

catalog/dags/data_refresh/distributed_reindex.py

@@ -84,7 +86,7 @@ def response_check_wait_for_completion(response: Response) -> bool:

 @task
 def get_worker_params(
-    estimated_record_count: int,
+    id_range: tuple[int, int],


Much clearer variable name!

AetherUnbound

Okay, here's the rest of my feedback!

AetherUnbound · 2024-09-06T17:52:55Z

catalog/dags/data_refresh/remap_table_constraints.py

+    constraint_statement: str, constraint_table: str, temp_table_name: str
+):
+    """
+    Parse a foreign key constraint and generate a `DELETE` statement to


Thanks for this great documentation!

AetherUnbound · 2024-09-06T17:58:07Z

catalog/dags/data_refresh/remap_table_constraints.py

+def apply_constraints_to_table(
+    postgres_conn_id: str, table_name: str, constraints: list[str]
+):
+    logger.info(f"Applying constraints for `{table_name}`.")
+    for constraint in constraints:
+        run_sql.function(postgres_conn_id=postgres_conn_id, sql_template=constraint)


Thinking about failure retrying and idempotency here...should these maybe each be separate tasks that are run serially? They can probably still be mapped (by both table name and constraint order) 🤔 Not something that needs to be done now, but I'm thinking about the case where this fails half way through for some reason. We could maybe address and fix the issue, then restart it from that line rather than having to restart things further upstream. Just a though, no action needed!

You can't dynamically map tasks within a dynamically mapped taskgroup, unfortunately (at least yet). I suppose we could have generate_constraints_for_table generate constraints for all tables, and then map over table + constraint together when applying, although I don't like that control flow as much... 🤔 I think we could take this into consideration as part of the same issue I'll be creating for the other constraint concerns, particularly as neither are regressions from existing behavior.

catalog/dags/data_refresh/remap_table_indices.py

AetherUnbound · 2024-09-06T18:05:55Z

catalog/env.template

+# Connect to sensitive terms list used for filtered index creation in the data refresh.
+# For local development, the mock terms list is served by the catalog indexer worker. Change to the
+# full production URL in order to use the production sensitive terms list.
+AIRFLOW_CONN_SENSITIVE_TERMS="http://catalog_indexer_worker:8003/static/mock_sensitive_terms.txt"


I think this is fine for now! One disadvantage of declaring this in the Admin UI is that the password (if needed) for the services isn't codified in the same way in our infrastructure repo. Noting that for now, I'll add more thorough thoughts to that discussion when I get a chance to look at that discussion.

catalog/tests/dags/data_refresh/test_remap_table_constraints.py

catalog/tests/dags/data_refresh/test_remap_table_indices.py

AetherUnbound · 2024-09-06T18:17:43Z

indexer_worker/indexer_worker/api.py

@@ -116,6 +117,7 @@ def create_api():
    _api.add_route("/healthcheck", HealthcheckResource())
    _api.add_route("/task", IndexingJobResource(task_tracker))
    _api.add_route("/task/{task_id}", TaskStatusResource(task_tracker))
+    _api.add_static_route("/static", (Path(".") / "static").absolute())


We might be able to initialize the operator inside a @task, but it's probably easier to use the HttpHook inside a @task than have an operator-in-operator situation by using HttpOperator.

I think I'm fine going with the new static route approach suggested here, but @stacimc do you mind adding a comment about what the route is for, and that it's only intended to be used for local testing?

stacimc · 2024-09-07T00:38:55Z

@AetherUnbound I've run out of time today but written myself a TODO to create issues for the constraint error handling concerns you raised as well as Slack reporting. I'm not sure what Slack reporting we'll want yet, since unlike the ingestion server we'll already have much easier visibility into what's happening through Airflow -- but I think we ought to have reporting at least when the copy data and reindexing steps finish, for sure.

AetherUnbound

Fantastic work on this, looks great to me! I didn't get a chance to test it but given that other reviewers have, I feel comfortable approving it 😄

stacimc self-assigned this Aug 29, 2024

stacimc mentioned this pull request Aug 29, 2024

DAG for generating and inserting Rekognition labels #4836

Merged

8 tasks

stacimc commented Aug 30, 2024

View reviewed changes

stacimc marked this pull request as ready for review August 30, 2024 23:58

stacimc requested a review from a team as a code owner August 30, 2024 23:58

stacimc requested review from krysal, AetherUnbound and sarayourfriend and removed request for a team August 30, 2024 23:58

obulat reviewed Aug 31, 2024

View reviewed changes

sarayourfriend reviewed Sep 3, 2024

View reviewed changes

sarayourfriend mentioned this pull request Sep 5, 2024

Refactor usage of environment variables for DAG configuration to use Airflow Variables #4863

Open

11 tasks

stacimc added 8 commits September 5, 2024 12:27

Get sensitive terms list

390210d

Add steps to create and populate the filtered index

262f6b9

Pull out run_sql utility

25178f7

First attempt to generate table indices using dynamic mapping

65b68c7

Don't dynamically map separate tasks for creating each index

1f6b69d

This isn't possible because dynamically mapping tasks within a dynamically mapped taskgroup is not supported in our version of Airflow.

Add apply_constraints steps mirroring ingestion server implementation

9e2f4e1

Correctly drop orphans referencing records that do not exist in the t…

aad2ea2

…emp table

stacimc added 9 commits September 5, 2024 12:28

Fix bug with index offsets when running consecutive data refreshes

cb3e22a

Add docstrings, split index/constraints out into separate files

94557a7

Do not promote if reindexing fails!

00be547

Add tests

1fc7004

Make filtered index timeouts configurable

d06865c

Ensure all copy data steps run one at a time

89f825a

Remove unused function, fix typo

8f429ab

Correctly report record count of the promoted index, not the old one

bd57214

stacimc force-pushed the add/filtered-index-and-promotion-steps branch from e12db4a to bd57214 Compare September 5, 2024 22:48

AetherUnbound reviewed Sep 5, 2024

View reviewed changes

obulat approved these changes Sep 6, 2024

View reviewed changes

AetherUnbound reviewed Sep 6, 2024

View reviewed changes

stacimc added 6 commits September 6, 2024 15:33

Use EmptyOperator where possible

601c876

Clean up variables

eaca371

Fix incorrect comment

769f248

Clean up unused code, variable name

f9def82

Further break out tests

0cab8ca

Protect against table names containing character

a10bede

Comment on static route

99d77ea

This was referenced Sep 11, 2024

Handle constraint mapping failures in data refresh #4916

Open

Add/review Slack messaging for the new data refresh #4917

Open

stacimc requested a review from AetherUnbound September 11, 2024 20:48

AetherUnbound approved these changes Sep 12, 2024

View reviewed changes

stacimc merged commit 8b2f037 into main Sep 16, 2024
51 checks passed

stacimc deleted the add/filtered-index-and-promotion-steps branch September 16, 2024 18:34

stacimc removed the 🧱 stack: ingestion server Related to the ingestion/data refresh server label Sep 23, 2024

		logger = logging.getLogger(__name__)


		class TableIndex(NamedTuple):

		The constraint to remap may be a constraint directly on the live media table, in
		which case it is dropped and then added directly to the temp table. Alternatively,

Add filtered index and promotion steps to data refresh #4833

Add filtered index and promotion steps to data refresh #4833

Conversation

stacimc commented Aug 29, 2024 • edited Loading

Fixes

Description

Testing Instructions

Establish a baseline

Run the new data refreshes and compare

Checklist

Developer Certificate of Origin

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stacimc commented Aug 31, 2024

Choose a reason for hiding this comment

sarayourfriend left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

obulat commented Sep 5, 2024

stacimc commented Sep 5, 2024

AetherUnbound left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openverse-bot commented Sep 6, 2024

Footnotes

obulat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AetherUnbound left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stacimc commented Sep 7, 2024

AetherUnbound left a comment

Choose a reason for hiding this comment

stacimc commented Aug 29, 2024 •

edited

Loading