Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Elasticsearch Airflow Provider #2370

Closed
AetherUnbound opened this issue Jun 9, 2023 · 2 comments · Fixed by #3366
Closed

Add Elasticsearch Airflow Provider #2370

AetherUnbound opened this issue Jun 9, 2023 · 2 comments · Fixed by #3366
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs

Comments

@AetherUnbound
Copy link
Collaborator

Description

Add the Elasticsearch Airflow provider to the catalog dependencies as described in the Implementation Plan for this work: https://docs.openverse.org/projects/proposals/search_relevancy_sandbox/20230518-implementation_plan_staging_index_rapid_iteration.html#elasticsearch-provider

Additional context

This is part of #392

@AetherUnbound AetherUnbound added 🌟 goal: addition Addition of new feature 💻 aspect: code Concerns the software code in the repository 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Jun 9, 2023
@AetherUnbound AetherUnbound self-assigned this Jun 30, 2023
@AetherUnbound AetherUnbound removed their assignment Oct 4, 2023
@sarayourfriend
Copy link
Collaborator

Do we still want to take this approach? We've introduced the ability to interact directly with the Elasticsearch REST API. Does the provider have significant benefits over directly interacting with the REST API? Just wondering if we can cut out this and #2371 and reduce the scope of the project.

@AetherUnbound
Copy link
Collaborator Author

I think because the complexity of operations we're planning on performing in Airflow is higher than what exists currently, it makes sense to leverage the provider package to bring in a known-compatible version of the Elasticsearch python package. That will allow us to use an API that we're familiar with and use elsewhere (like the API and ingestion server) rather than having separate definitions for interacting directly with the REST API for Elasticsearch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants