-
Notifications
You must be signed in to change notification settings - Fork 51
Use the database category value for Elasticsearch model #829
Conversation
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
API Developer Docs Preview: Ready https://wordpress.github.io/openverse-api/_preview/829 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. |
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
Signed-off-by: Olga Bulat <[email protected]>
It is related only insofar as both steps are run during the data refresh process, and both PRs would simplify and speed it up. However, the big difference is that #839 is blocked because the upstream database needs to be cleaned up before removing the API clean up process, but this PR can be merged as is. The computed category information was mostly unused by the API, and upstream has a lot of the computed category data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look very valid to me.
@@ -42,7 +42,7 @@ class Image(ImageFileMixin, AbstractMedia): | |||
""" | |||
Inherited fields | |||
================ | |||
category: eg. photograph, digitised artwork & illustration | |||
category: eg. photograph, digitized_artwork & illustration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My UK-based spellchecker gets me into a lot of trouble.
Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR: @krysal Excluding weekend1 days, this PR was updated 4 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2. @obulat, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes |
Fixes
Fixes #710 by @obulat
Description
On the first implementation of filtering by category, catalog database did not have the
category
field.Then, the API implemented a computation of the
category
. However, because of the order of code computation, the computed value was (anyways) overwritten by the value in the upstream database, even if it was None.This PR removes the
categorize
module for category computation because it is done by the catalog.Testing Instructions
Note: I could not run
just init
from this branch. The only way I could test this branch is by using the changes toload_sample_data
script from #830.Testing this PR
To test that the code uses
category
from the database, changesample_image.csv
items. You would need to replace the second last item in the rows with eitherdigitized_artwork
orillustration
. Then, runjust init
and go tohttp://localhost:8000/v1/images/?category=digitized_artwork,illustration
. You should see two results.Testing that category computation does not work
You can see that the current code always replaces the computed category (including
illustration
for.svg
items, and default for provider, such asdigitized_artwork
for clevelandmuseum).main
branch.sample_image.csv
items. In the first item, we setcategory
toNone
(by deleting the second from last item in the row) and replace theprovider
andsource
withclevelandmuseum
. If the code was implemented correctly, the image category would be set todigitized_artwork
as the default category forclevelandmuseum
. In the second item, set the URL extension to.svg
, and delete thecategory
value. The category for items withsvg
extension were supposed to be set toillustration
. You can simply the first 2 lines insample_image.csv
with the copy below:If you run
just init
on main with these lines, and search fordigitized_artwork
items (http://localhost:8000/v1/images/?category=digitized_artwork
), you will not get any results.And if you replaced the last line here:
openverse-api/ingestion_server/ingestion_server/elasticsearch_models.py
Lines 232 to 234 in 9eb73b2
with
category = row[schema["category"]] or category
you would see that the category falls back to the computed value if the database category is
NULL
.Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin