Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tag filtering steps during ingestion #4542

Open
AetherUnbound opened this issue Jun 21, 2024 · 0 comments
Open

Remove tag filtering steps during ingestion #4542

AetherUnbound opened this issue Jun 21, 2024 · 0 comments
Labels
💻 aspect: code Concerns the software code in the repository 🗄️ aspect: data Concerns the data in our catalog and/or databases ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs ⛔ status: blocked Blocked & therefore, not ready for work

Comments

@AetherUnbound
Copy link
Collaborator

Description

Blocked by #4541 and #3925 more broadly.

Per #4465 and the clarity gleaned in other places within the project, we are moving towards the catalog database serving as a "data warehouse". Operationally, this means we intend to store as much information as we can in it and filter out low quality or inaccurate data during the data refresh process. As such, once the ingestion server is removed and we have data filtering in place, we should remove the steps that occur during provider ingestion which would remove denylisted tags.

Specifically, we can remove the step here that would remove tags (the _tag_denylisted function):

return [
self._format_raw_tag(tag)
for tag in raw_tags
if not self._tag_denylisted(tag)
]

The filtering of denylisted tags will happen entirely in the data refresh process instead.

Additional context

See the discussion that prompted this in #4464.

@AetherUnbound AetherUnbound added ⛔ status: blocked Blocked & therefore, not ready for work ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository 🗄️ aspect: data Concerns the data in our catalog and/or databases 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🗄️ aspect: data Concerns the data in our catalog and/or databases ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs ⛔ status: blocked Blocked & therefore, not ready for work
Projects
Status: ⛔ Blocked
Development

No branches or pull requests

1 participant