Investigate dead links from auckland_museum DAG #3658
Labels
💻 aspect: code
Concerns the software code in the repository
✨ goal: improvement
Improvement to an existing user-facing feature
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: catalog
Related to the catalog and Airflow DAGs
Problem
As noted in this comment, many of the results that we ingest from Auckland War Memorial Museum Tāmaki Paenga Hira show either "Server error" or "Online image not available" for the main image file.
These examples were found by @obulat:
"Online image not available" placeholder
https://api.aucklandmuseum.com/id/media/v/3191, https://api.aucklandmuseum.com/id/media/v/2882,
Internal server error
https://api.aucklandmuseum.com/id/media/v/861840
https://api.aucklandmuseum.com/id/media/v/528116
https://api.aucklandmuseum.com/id/media/v/828322
https://api.aucklandmuseum.com/id/media/v/229298
No image on the landing page
https://www.aucklandmuseum.com/collections-research/collections/record/am_humanhistory-object-9923
Placeholder image on the landing page
https://www.aucklandmuseum.com/collections-research/collections/record/am_naturalsciences-object-368805
Description
We should investigate the URLs being returned and ensure that we do not add a large number of deadlinks to the catalog. We should consider getting in touch with the museum to clarify the intention behind results (which are potentially temporary access issues, vs results whose public access has been intentionally removed for cultural sensitivity reasons, and which we therefore do not want to index).
The DAG should be updated to exclude results we don't want to index, once this has been determined. Note that the DAG is not being enabled until this issue is resolved.
Additional context
Read the discussion on #3258, starting around this comment, for more context.
The text was updated successfully, but these errors were encountered: