Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support existing index as an ingest data source #513

Open
kamingleung opened this issue Dec 6, 2024 · 2 comments
Open

[FEATURE] Support existing index as an ingest data source #513

kamingleung opened this issue Dec 6, 2024 · 2 comments
Labels
enhancement New feature or request v2.20.0

Comments

@kamingleung
Copy link

kamingleung commented Dec 6, 2024

Is your feature request related to a problem?

Currently, when users are building an ingest pipeline in the UI, they can only import data in JSON array format (e.g. by copy and pasting, uploading json, copy a sample of the data from an index:
image

There is an additional use case where users may want to use data from an existing index and enrich/transform that data back into the same index.

What solution would you like?

Besides import data, we may consider offering more ways for users to choose as the source data.
Note: these mockups are conceptual, they are NOT ready for implementation.

For example:
On the first step of ingest pipeline, users can select an existing index from a local or remote cluster:
image

On the last step of ingest pipeline, users can configure rather they want to ingest the enriched data back into the same index:
image


More future ideas

We may consider supporting different types of data sources, so users can easily pull in external data into their workflow:
For example, I may select DynamoDB or Glue tables as my data source -> enrich the data -> ingest the data into an index:
image

What alternatives have you considered?

These mockups are conceptual, they are NOT ready for implementation.
Open to discuss alternative ideas.

@kamingleung kamingleung added enhancement New feature or request untriaged labels Dec 6, 2024
@kamingleung
Copy link
Author

@dylan-tong-aws Can you add in details and use cases?

@dylan-tong-aws
Copy link

dylan-tong-aws commented Dec 6, 2024

Here's a couple of scenarios that we don't currently support, which are relatively common:

  1. Building flows for an existing index. In this case, one could build an ingest pipeline with the intent to re-index the existing index. For instance, I may want to add sparse or dense vector fields into an existing index that is only enabled for BM25. I many want to create an ingestion pipeline with ML processors to generate the vectors, map the new vector fields and re-index (eg. [FEATURE] Enable easy migration from BM25 to Neural Index with Reindex Step flow-framework#617)

  2. In the future, we may also need to support data connectors to services like Amazon S3. There's an initiative to enable better OOTB semantic search, and being able to bring data directly from sources like Amazon S3 or support for DataPrepper/OSI will facilitate this initiative.

@ohltyler ohltyler added v2.20.0 and removed untriaged labels Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v2.20.0
Projects
None yet
Development

No branches or pull requests

3 participants