Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where ingestion failed for input document containing list of nested objects #1040

Merged
merged 3 commits into from
Jan 3, 2025

Conversation

yizheliu-amazon
Copy link
Contributor

Description

Fix bug where ingestion failed for input document containing list of nested objects

Related Issues

Resolves #1024

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@heemin32
Copy link
Collaborator

Can we have IT test for this?

@heemin32 heemin32 added the backport 2.x Label will add auto workflow to backport PR to 2.x branch label Dec 24, 2024
@yizheliu-amazon
Copy link
Contributor Author

yizheliu-amazon commented Dec 26, 2024

Can we have IT test for this?

Thanks for the review. I tried adding IT test for it, but found a new issue in the case of doc containing list of nested objects with multiple dots .: issue #1042 . The ingest pipeline example in issue #1042 is actually from config file of existing IT. That being said, given pipeline config of existing IT in the code, new IT test for this change will fail. Such issue is not related to this bug fix PR, but related to case where doc containing list of nested objects with multiple dots . is being ingested. Existing ITs can pass because such case is not covered.

To work around it, we can either

  1. fix current bug, then fix issue [BUG] Fail to generate embedding for ingest document with nested field defined in field map #1042; in the PR for issue [BUG] Fail to generate embedding for ingest document with nested field defined in field map #1042, I can add IT for the case of ingestion of doc with list of nested objects.
  2. create a new pipeline configuration like below for IT which is working for this PR, but this may seem unnecessary because such new pipeline is very similar to existing one. If IT for this PR can pass given existing pipeline, it can also pass for below pipeline.
{
  "description": "text embedding pipeline for hybrid",
  "processors": [
    {
      "text_embedding": {
        "model_id": "%s",
        "field_map": {
          "title": "title_knn",
          "favor_list": "favor_list_knn",
          "favorites": {
            "game": "game_knn",
            "movie": "movie_knn"
          },
          "nested_passages": "level_1_embedding"
        }
      }
    }
  ]
}

I may prefer option 1 since option 2 seems unnecessary to me.

@heemin32
Copy link
Collaborator

@yizheliu-amazon Thanks for the detail explanation. I will leave it to you to decided for the next step among the two option. Thanks!

@heemin32 heemin32 merged commit 90df6c9 into opensearch-project:main Jan 3, 2025
41 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 3, 2025
…nested objects (#1040)

* Fix bug where ingestion failed for input document containing list of nested objects

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments to use better method name/implementation

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments: modify the test case to have doc with various fields

Signed-off-by: Yizhe Liu <[email protected]>

---------

Signed-off-by: Yizhe Liu <[email protected]>
(cherry picked from commit 90df6c9)
martin-gaievski pushed a commit that referenced this pull request Jan 3, 2025
…nested objects (#1040) (#1053)

* Fix bug where ingestion failed for input document containing list of nested objects

Signed-off-by: Yizhe Liu <[email protected]>
(cherry picked from commit 90df6c9)

Co-authored-by: Yizhe Liu <[email protected]>
heemin32 pushed a commit to heemin32/neural-search that referenced this pull request Jan 9, 2025
…nested objects (opensearch-project#1040) (opensearch-project#1053)

* Fix bug where ingestion failed for input document containing list of nested objects

Signed-off-by: Yizhe Liu <[email protected]>
(cherry picked from commit 90df6c9)

Co-authored-by: Yizhe Liu <[email protected]>
martin-gaievski pushed a commit that referenced this pull request Jan 10, 2025
…nested objects (#1040)

* Fix bug where ingestion failed for input document containing list of nested objects

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments to use better method name/implementation

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments: modify the test case to have doc with various fields

Signed-off-by: Yizhe Liu <[email protected]>

---------

Signed-off-by: Yizhe Liu <[email protected]>
martin-gaievski pushed a commit that referenced this pull request Jan 13, 2025
…nested objects (#1040)

* Fix bug where ingestion failed for input document containing list of nested objects

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments to use better method name/implementation

Signed-off-by: Yizhe Liu <[email protected]>

* Address comments: modify the test case to have doc with various fields

Signed-off-by: Yizhe Liu <[email protected]>

---------

Signed-off-by: Yizhe Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch bug Something isn't working v2.19.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Fail to ingest document with nested list into text_embedding processor
3 participants