Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate dead links from auckland_museum DAG #3658

Open
stacimc opened this issue Jan 12, 2024 · 0 comments
Open

Investigate dead links from auckland_museum DAG #3658

stacimc opened this issue Jan 12, 2024 · 0 comments
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs

Comments

@stacimc
Copy link
Collaborator

stacimc commented Jan 12, 2024

Problem

As noted in this comment, many of the results that we ingest from Auckland War Memorial Museum Tāmaki Paenga Hira show either "Server error" or "Online image not available" for the main image file.

These examples were found by @obulat:

"Online image not available" placeholder

https://api.aucklandmuseum.com/id/media/v/3191, https://api.aucklandmuseum.com/id/media/v/2882,

Internal server error

https://api.aucklandmuseum.com/id/media/v/861840
https://api.aucklandmuseum.com/id/media/v/528116
https://api.aucklandmuseum.com/id/media/v/828322
https://api.aucklandmuseum.com/id/media/v/229298

No image on the landing page

https://www.aucklandmuseum.com/collections-research/collections/record/am_humanhistory-object-9923

Placeholder image on the landing page

https://www.aucklandmuseum.com/collections-research/collections/record/am_naturalsciences-object-368805

Description

We should investigate the URLs being returned and ensure that we do not add a large number of deadlinks to the catalog. We should consider getting in touch with the museum to clarify the intention behind results (which are potentially temporary access issues, vs results whose public access has been intentionally removed for cultural sensitivity reasons, and which we therefore do not want to index).

The DAG should be updated to exclude results we don't want to index, once this has been determined. Note that the DAG is not being enabled until this issue is resolved.

Additional context

Read the discussion on #3258, starting around this comment, for more context.

@stacimc stacimc added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Jan 12, 2024
@openverse-bot openverse-bot moved this to 📋 Backlog in Openverse Backlog Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant