You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when users are building an ingest pipeline in the UI, they can only import data in JSON array format (e.g. by copy and pasting, uploading json, copy a sample of the data from an index:
There is an additional use case where users may want to use data from an existing index and enrich/transform that data back into the same index.
What solution would you like?
Besides import data, we may consider offering more ways for users to choose as the source data. Note: these mockups are conceptual, they are NOT ready for implementation.
For example:
On the first step of ingest pipeline, users can select an existing index from a local or remote cluster:
On the last step of ingest pipeline, users can configure rather they want to ingest the enriched data back into the same index:
More future ideas
We may consider supporting different types of data sources, so users can easily pull in external data into their workflow:
For example, I may select DynamoDB or Glue tables as my data source -> enrich the data -> ingest the data into an index:
What alternatives have you considered?
These mockups are conceptual, they are NOT ready for implementation.
Open to discuss alternative ideas.
The text was updated successfully, but these errors were encountered:
Here's a couple of scenarios that we don't currently support, which are relatively common:
Building flows for an existing index. In this case, one could build an ingest pipeline with the intent to re-index the existing index. For instance, I may want to add sparse or dense vector fields into an existing index that is only enabled for BM25. I many want to create an ingestion pipeline with ML processors to generate the vectors, map the new vector fields and re-index (eg. [FEATURE] Enable easy migration from BM25 to Neural Index with Reindex Step flow-framework#617)
In the future, we may also need to support data connectors to services like Amazon S3. There's an initiative to enable better OOTB semantic search, and being able to bring data directly from sources like Amazon S3 or support for DataPrepper/OSI will facilitate this initiative.
Is your feature request related to a problem?
Currently, when users are building an ingest pipeline in the UI, they can only import data in JSON array format (e.g. by copy and pasting, uploading json, copy a sample of the data from an index:
There is an additional use case where users may want to use data from an existing index and enrich/transform that data back into the same index.
What solution would you like?
Besides import data, we may consider offering more ways for users to choose as the source data.
Note: these mockups are conceptual, they are NOT ready for implementation.
For example:
On the first step of ingest pipeline, users can select an existing index from a local or remote cluster:
On the last step of ingest pipeline, users can configure rather they want to ingest the enriched data back into the same index:
More future ideas
We may consider supporting different types of data sources, so users can easily pull in external data into their workflow:
For example, I may select DynamoDB or Glue tables as my data source -> enrich the data -> ingest the data into an index:
What alternatives have you considered?
These mockups are conceptual, they are NOT ready for implementation.
Open to discuss alternative ideas.
The text was updated successfully, but these errors were encountered: