Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(langchain): use correct class names for pinecone vectorstore check #9759

Merged
merged 22 commits into from
Jul 19, 2024

Conversation

Yun-Kim
Copy link
Contributor

@Yun-Kim Yun-Kim commented Jul 8, 2024

This PR fixes the class names used to check for Pinecone vectorstore instances in the langchain integration. Previously the incorrect base class for all vectorstores was being used, which would mean all vectorstores would return true for this check.

This PR also marks langchain community llmobs tests as flaky due to the vcrpy issue as described in #9768.

Notes

Langchain has a tricky versioning system with tons of deprecations and removals (even between minor versions):

  • langchain<0.1: Uses the base langchain module to access langchain.vectorstores.Pinecone (note this indirectly imports from langchain_community.
  • langchain>=0.1: Uses the langchain_community module to access langchain_community.vectorstores.Pinecone
  • langchain>=0.1 but if users have langchain-pinecone installed: Uses langchain_pinecone module to access langchain_pinecone.vectorstores.Pinecone (deprecated, subclass of PineconeVectorStore) or langchain_pinecone.vectorstores.PineconeVectorStore.

We use the above information to use as the logic to check if an instance is a pinecone vectorstore.

Checklist

  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@Yun-Kim Yun-Kim requested review from a team as code owners July 8, 2024 21:38
@datadog-dd-trace-py-rkomorn
Copy link

datadog-dd-trace-py-rkomorn bot commented Jul 8, 2024

Datadog Report

Branch report: yunkim/langchain-pinecone-vectorstore-fix
Commit report: 20653e4
Test service: dd-trace-py

✅ 0 Failed, 117508 Passed, 59424 Skipped, 3h 46m 11.81s Total duration (5h 43m 26.16s time saved)

@codecov-commenter
Copy link

codecov-commenter commented Jul 8, 2024

Codecov Report

Attention: Patch coverage is 0% with 19 lines in your changes missing coverage. Please review.

Project coverage is 10.53%. Comparing base (c728c68) to head (20653e4).
Report is 34 commits behind head on main.

Files Patch % Lines
...ests/contrib/langchain/test_langchain_community.py 0.00% 13 Missing ⚠️
ddtrace/contrib/langchain/patch.py 0.00% 3 Missing ⚠️
tests/contrib/langchain/test_langchain_llmobs.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #9759       +/-   ##
===========================================
- Coverage   74.30%   10.53%   -63.77%     
===========================================
  Files        1398     1367       -31     
  Lines      129930   127822     -2108     
===========================================
- Hits        96541    13466    -83075     
- Misses      33389   114356    +80967     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pr-commenter
Copy link

pr-commenter bot commented Jul 8, 2024

Benchmarks

Benchmark execution time: 2024-07-17 21:07:08

Comparing candidate commit 02c34b2 in PR branch yunkim/langchain-pinecone-vectorstore-fix with baseline commit 677fef9 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 214 metrics, 2 unstable metrics.

@Yun-Kim Yun-Kim enabled auto-merge (squash) July 9, 2024 16:57
@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-pinecone-vectorstore-fix branch 2 times, most recently from 808f930 to 6b42456 Compare July 11, 2024 19:32
@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-pinecone-vectorstore-fix branch from 6b42456 to f7527b6 Compare July 11, 2024 19:32
@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-pinecone-vectorstore-fix branch from c9caf8f to ace857f Compare July 11, 2024 21:26
@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-pinecone-vectorstore-fix branch from 756838d to ff73f6f Compare July 13, 2024 17:08
@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-pinecone-vectorstore-fix branch from ff73f6f to 0c4acea Compare July 13, 2024 17:20
Copy link
Collaborator

@emmettbutler emmettbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring to other reviewers

@Yun-Kim
Copy link
Contributor Author

Yun-Kim commented Jul 15, 2024

Note: there is a cassette file issue with the dependency updates, will look into this

Copy link
Contributor

github-actions bot commented Jul 16, 2024

CODEOWNERS have been resolved as:

releasenotes/notes/fix-langchain-pinecone-vectorstore-082c023ccae268e1.yaml  @DataDog/apm-python
tests/contrib/langchain/cassettes/langchain_community/openai_retrieval_embedding.yaml  @DataDog/ml-observability
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_faiss_vectorstore_retrieval.json  @DataDog/apm-python
.riot/requirements/1598e9b.txt                                          @DataDog/apm-python
.riot/requirements/1810353.txt                                          @DataDog/apm-python
.riot/requirements/1dca1e6.txt                                          @DataDog/apm-python
.riot/requirements/7aeeb05.txt                                          @DataDog/apm-python
.riot/requirements/8fdfb07.txt                                          @DataDog/apm-python
.riot/requirements/b26ea62.txt                                          @DataDog/apm-python
.riot/requirements/b5852df.txt                                          @DataDog/apm-python
.riot/requirements/ccc7691.txt                                          @DataDog/apm-python
.riot/requirements/fd7ae89.txt                                          @DataDog/apm-python
ddtrace/contrib/langchain/patch.py                                      @DataDog/ml-observability
riotfile.py                                                             @DataDog/apm-python
tests/contrib/langchain/test_langchain_community.py                     @DataDog/ml-observability
tests/contrib/langchain/test_langchain_llmobs.py                        @DataDog/ml-observability
.riot/requirements/150eea5.txt                                          @DataDog/apm-python
.riot/requirements/9946322.txt                                          @DataDog/apm-python
.riot/requirements/9b67887.txt                                          @DataDog/apm-python

@Yun-Kim Yun-Kim merged commit ab3d2ce into main Jul 19, 2024
173 checks passed
@Yun-Kim Yun-Kim deleted the yunkim/langchain-pinecone-vectorstore-fix branch July 19, 2024 10:43
Copy link
Contributor

The backport to 2.9 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.9 2.9
# Navigate to the new working tree
cd .worktrees/backport-2.9
# Create a new branch
git switch --create backport-9759-to-2.9
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ab3d2ce4fcb10e6e18646656c046ad37f67f85b4
# Push it to GitHub
git push --set-upstream origin backport-9759-to-2.9
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.9

Then, create a pull request where the base branch is 2.9 and the compare/head branch is backport-9759-to-2.9.

Copy link
Contributor

The backport to 2.10 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.10 2.10
# Navigate to the new working tree
cd .worktrees/backport-2.10
# Create a new branch
git switch --create backport-9759-to-2.10
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ab3d2ce4fcb10e6e18646656c046ad37f67f85b4
# Push it to GitHub
git push --set-upstream origin backport-9759-to-2.10
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.10

Then, create a pull request where the base branch is 2.10 and the compare/head branch is backport-9759-to-2.10.

Yun-Kim added a commit that referenced this pull request Jul 19, 2024
#9759)

This PR fixes the class names used to check for `Pinecone` vectorstore
instances in the langchain integration. Previously the incorrect base
class for all vectorstores was being used, which would mean all
vectorstores would return true for this check.

Langchain has a tricky versioning system with tons of deprecations and
removals (even between minor versions):
- `langchain<0.1`: Uses the base `langchain` module to access
`langchain.vectorstores.Pinecone` (note this indirectly imports from
`langchain_community`.
- `langchain>=0.1`: Uses the `langchain_community` module to access
`langchain_community.vectorstores.Pinecone`
- `langchain>=0.1` but if users have `langchain-pinecone` installed:
Uses `langchain_pinecone` module to access
`langchain_pinecone.vectorstores.Pinecone` (deprecated, subclass of
`PineconeVectorStore`) or
`langchain_pinecone.vectorstores.PineconeVectorStore`.

We use the above information to use as the logic to check if an instance
is a pinecone vectorstore.

- [x] The PR description includes an overview of the change
- [x] The PR description articulates the motivation for the change
- [x] The change includes tests OR the PR description describes a
testing strategy
- [x] The PR description notes risks associated with the change, if any
- [x] Newly-added code is easy to change
- [x] The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- [x] The change includes or references documentation updates if
necessary
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

- [x] Title is accurate
- [x] All changes are related to the pull request's stated goal
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- [x] Testing strategy adequately addresses listed risks
- [x] Newly-added code is easy to change
- [x] Release note makes sense to a user of the library
- [x] If necessary, author has acknowledged and discussed the
performance implications of this PR as reported in the benchmarks PR
comment
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Yun-Kim added a commit that referenced this pull request Jul 25, 2024
#9759)

This PR fixes the class names used to check for `Pinecone` vectorstore
instances in the langchain integration. Previously the incorrect base
class for all vectorstores was being used, which would mean all
vectorstores would return true for this check.

Langchain has a tricky versioning system with tons of deprecations and
removals (even between minor versions):
- `langchain<0.1`: Uses the base `langchain` module to access
`langchain.vectorstores.Pinecone` (note this indirectly imports from
`langchain_community`.
- `langchain>=0.1`: Uses the `langchain_community` module to access
`langchain_community.vectorstores.Pinecone`
- `langchain>=0.1` but if users have `langchain-pinecone` installed:
Uses `langchain_pinecone` module to access
`langchain_pinecone.vectorstores.Pinecone` (deprecated, subclass of
`PineconeVectorStore`) or
`langchain_pinecone.vectorstores.PineconeVectorStore`.

We use the above information to use as the logic to check if an instance
is a pinecone vectorstore.

- [x] The PR description includes an overview of the change
- [x] The PR description articulates the motivation for the change
- [x] The change includes tests OR the PR description describes a
testing strategy
- [x] The PR description notes risks associated with the change, if any
- [x] Newly-added code is easy to change
- [x] The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- [x] The change includes or references documentation updates if
necessary
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

- [x] Title is accurate
- [x] All changes are related to the pull request's stated goal
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- [x] Testing strategy adequately addresses listed risks
- [x] Newly-added code is easy to change
- [x] Release note makes sense to a user of the library
- [x] If necessary, author has acknowledged and discussed the
performance implications of this PR as reported in the benchmarks PR
comment
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Yun-Kim added a commit that referenced this pull request Jul 25, 2024
#9759)

This PR fixes the class names used to check for `Pinecone` vectorstore
instances in the langchain integration. Previously the incorrect base
class for all vectorstores was being used, which would mean all
vectorstores would return true for this check.

Langchain has a tricky versioning system with tons of deprecations and
removals (even between minor versions):
- `langchain<0.1`: Uses the base `langchain` module to access
`langchain.vectorstores.Pinecone` (note this indirectly imports from
`langchain_community`.
- `langchain>=0.1`: Uses the `langchain_community` module to access
`langchain_community.vectorstores.Pinecone`
- `langchain>=0.1` but if users have `langchain-pinecone` installed:
Uses `langchain_pinecone` module to access
`langchain_pinecone.vectorstores.Pinecone` (deprecated, subclass of
`PineconeVectorStore`) or
`langchain_pinecone.vectorstores.PineconeVectorStore`.

We use the above information to use as the logic to check if an instance
is a pinecone vectorstore.

- [x] The PR description includes an overview of the change
- [x] The PR description articulates the motivation for the change
- [x] The change includes tests OR the PR description describes a
testing strategy
- [x] The PR description notes risks associated with the change, if any
- [x] Newly-added code is easy to change
- [x] The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- [x] The change includes or references documentation updates if
necessary
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

- [x] Title is accurate
- [x] All changes are related to the pull request's stated goal
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- [x] Testing strategy adequately addresses listed risks
- [x] Newly-added code is easy to change
- [x] Release note makes sense to a user of the library
- [x] If necessary, author has acknowledged and discussed the
performance implications of this PR as reported in the benchmarks PR
comment
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Yun-Kim added a commit that referenced this pull request Jul 25, 2024
#9875)

Backports #9759 to 2.10.

This PR fixes the class names used to check for `Pinecone` vectorstore
instances in the langchain integration. Previously the incorrect base
class for all vectorstores was being used, which would mean all
vectorstores would return true for this check.

Langchain has a tricky versioning system with tons of deprecations and
removals (even between minor versions):
- `langchain<0.1`: Uses the base `langchain` module to access
`langchain.vectorstores.Pinecone` (note this indirectly imports from
`langchain_community`.
- `langchain>=0.1`: Uses the `langchain_community` module to access
`langchain_community.vectorstores.Pinecone`
- `langchain>=0.1` but if users have `langchain-pinecone` installed:
Uses `langchain_pinecone` module to access
`langchain_pinecone.vectorstores.Pinecone` (deprecated, subclass of
`PineconeVectorStore`) or
`langchain_pinecone.vectorstores.PineconeVectorStore`.

We use the above information to use as the logic to check if an instance
is a pinecone vectorstore.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Yun-Kim added a commit that referenced this pull request Jul 25, 2024
…k [backport 2.9] (#9934)

Backport #9759 to 2.9.

This PR fixes the class names used to check for `Pinecone` vectorstore
instances in the langchain integration. Previously the incorrect base
class for all vectorstores was being used, which would mean all
vectorstores would return true for this check.

Langchain has a tricky versioning system with tons of deprecations and
removals (even between minor versions):
- `langchain<0.1`: Uses the base `langchain` module to access
`langchain.vectorstores.Pinecone` (note this indirectly imports from
`langchain_community`.
- `langchain>=0.1`: Uses the `langchain_community` module to access
`langchain_community.vectorstores.Pinecone`
- `langchain>=0.1` but if users have `langchain-pinecone` installed:
Uses `langchain_pinecone` module to access
`langchain_pinecone.vectorstores.Pinecone` (deprecated, subclass of
`PineconeVectorStore`) or
`langchain_pinecone.vectorstores.PineconeVectorStore`.

We use the above information to use as the logic to check if an instance
is a pinecone vectorstore.

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants