chore(langchain): fix flaky cassette tests, skipping logic #9768

Yun-Kim · 2024-07-09T17:47:01Z

This PR attempts to address the flakiness and general (lack of) reliability of the LangChain test suite, and removes the flaky test markers from most langchain tests. No functionality has been changed, this PR focuses only on improving our test suite to be less flaky and be a better signal of our langchain library coverage.

Problems with LangChain test suite:

Tests are randomly flaking due to cassettes being read incorrectly (See related issue)
Tests are completely skipped except for patch tests. This seems like something to do with our skipping logic by looking at langchain versions and Python versions (we skip a few test cases in Python 3.9 due to unnecessary cassette files that are required specifically for 3.9).

This PR solves the above problems by:

Pinning vcrpy to version 5.1.0 which is the version prior to the linked issue being introduced
Rewriting the langchain version skipping logic to rely specifically on the Langchain module version (instead of reusing the PATCH_LANGCHAIN_V0 constant from the patch file which appeared to not be truthful)
Change the Python version skip checks to only use major/minor versions rather than checking against (3, 10, 0) as this may result in some edge cases. Do not skip Python 3.9 testing on test_langchain_community.py or community tests on test_langchain_llmobs.py.

We have made some fuzzy approaches to handling Python 3.9 and langchain v0:

Python 3.9: Runs all tests for the most part, except for a handful of langchain v0 tests and one langchain v1 test due to the cassette file being different across python 3.9 and python 3.10+. To avoid generating so many test cassette files, we have skipped these handful of tests only for Python 3.9.
LangChain v0 tests: We have two sets of tests for langchain (v0) and langchain_community (v1+), and only run those tests for respective langchain version environments.

Checklist

The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

datadog-dd-trace-py-rkomorn · 2024-07-09T17:53:50Z

Datadog Report

Branch report: yunkim/langchain-fix-flaky
Commit report: a1503ab
Test service: dd-trace-py

✅ 0 Failed, 137477 Passed, 40702 Skipped, 7h 46m 59.24s Total duration (3m 2.1s time saved)

pr-commenter · 2024-07-09T18:52:40Z

Benchmarks

Benchmark execution time: 2024-07-29 21:54:17

Comparing candidate commit 3b416e1 in PR branch yunkim/langchain-fix-flaky with baseline commit d1db200 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 214 metrics, 2 unstable metrics.

codecov-commenter · 2024-07-11T21:22:04Z

Codecov Report

Attention: Patch coverage is 0% with 45 lines in your changes missing coverage. Please review.

Project coverage is 10.42%. Comparing base (2022a3b) to head (d16172a).
Report is 17 commits behind head on main.

Files	Patch %	Lines
...ests/contrib/langchain/test_langchain_community.py	0.00%	29 Missing ⚠️
tests/contrib/langchain/test_langchain.py	0.00%	15 Missing ⚠️
ddtrace/contrib/langchain/patch.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #9768       +/-   ##
===========================================
- Coverage   74.48%   10.42%   -64.07%     
===========================================
  Files        1391     1358       -33     
  Lines      128824   126595     -2229     
===========================================
- Hits        95959    13193    -82766     
- Misses      32865   113402    +80537

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-07-17T14:35:59Z

CODEOWNERS have been resolved as:

riotfile.py                                                             @DataDog/apm-python
tests/contrib/langchain/cassettes/langchain_community/ai21_completion_sync.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/anthropic_chat_completion_sync.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/cohere_completion_sync.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_openai_chain_acall.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_openai_chain_batch.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_openai_chain_call.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_openai_chain_call_complicated.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_openai_chain_nested.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_openai_chain_schema_io.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_with_tools_anthropic.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_with_tools_openai.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_chat_completion_async_call.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_chat_completion_async_generate.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_chat_completion_image_input_sync_generate.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_chat_completion_sync_call.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_chat_completion_sync_generate.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_completion_async.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_completion_error.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_completion_sync.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_completion_sync_multi_prompt.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_paraphrase.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_pinecone_vectorstore_retrieval_chain.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_sequential_paraphrase_and_rhyme_async.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_sequential_paraphrase_and_rhyme_sync.yaml  @DataDog/ml-observability
tests/contrib/langchain/test_langchain.py                               @DataDog/ml-observability
tests/contrib/langchain/test_langchain_community.py                     @DataDog/ml-observability
tests/contrib/langchain/test_langchain_llmobs.py                        @DataDog/ml-observability
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_ai21_llm_sync.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_chain_invoke.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_cohere_llm_sync.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_lcel_chain_complicated.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_lcel_chain_nested.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_lcel_chain_simple_async.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_lcel_with_tools_anthropic.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_chat_model_async_call.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_chat_model_async_generate.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_chat_model_sync_call_langchain_openai.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_chat_model_sync_generate.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_chat_model_vision_generate.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_integration.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_llm_sync_multiple_prompts.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_sequential_chain.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_sequential_chain_with_multiple_llm_sync.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_service_name[None-None].json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_service_name[None-v0].json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_service_name[None-v1].json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_service_name[mysvc-None].json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_service_name[mysvc-v0].json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_service_name[mysvc-v1].json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_pinecone_vectorstore_retrieval_chain.json  @DataDog/apm-python
.riot/requirements/11063bf.txt                                          @DataDog/apm-python
.riot/requirements/16c3b9f.txt                                          @DataDog/apm-python
.riot/requirements/1761cfc.txt                                          @DataDog/apm-python
.riot/requirements/18bc2ac.txt                                          @DataDog/apm-python
.riot/requirements/19f2225.txt                                          @DataDog/apm-python
.riot/requirements/1ec1dbf.txt                                          @DataDog/apm-python
.riot/requirements/457db9b.txt                                          @DataDog/apm-python
.riot/requirements/55a4977.txt                                          @DataDog/apm-python
.riot/requirements/585e779.txt                                          @DataDog/apm-python
.riot/requirements/a311bc2.txt                                          @DataDog/apm-python
.riot/requirements/aa1fe5c.txt                                          @DataDog/apm-python
.riot/requirements/cbbb0eb.txt                                          @DataDog/apm-python
.riot/requirements/cf9bdda.txt                                          @DataDog/apm-python
.riot/requirements/d39d3de.txt                                          @DataDog/apm-python
tests/contrib/langchain/cassettes/langchain_community/openai_math_chain.yaml  @DataDog/ml-observability
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_math_chain.json  @DataDog/apm-python

…or langchain v0 tests

…e for CI (#9997) Even after #9768 we're still seeing some tests be skipped on CI, specifically `langchain-community` tests even when `langchain>=0.1`. My suspicion is that the `sys.version_info < (0, 1, 0)` conditional that is used for the test skipping is too precise for the versions of Langchain being run on CI (not exactly sure what the issue is, but I've narrowed it down to that area). This PR only checks the major and minor version of Langchain being run ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

Remove flaky markers

c0d998a

Yun-Kim added the changelog/no-changelog A changelog entry is not required for this PR. label Jul 9, 2024

tmp change to langchain integration to only trigger langchain tests

c139f43

Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from ddcf4a1 to c139f43 Compare July 9, 2024 17:53

Yun-Kim added 2 commits July 9, 2024 16:26

Change pytest skip markers?

5c4fdb6

small cleanup of test file

f6d7302

Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from 33f1650 to f6d7302 Compare July 10, 2024 21:57

Add verbose option to pytest run to see which tests are being skipped

d16172a

Pin vcrpy, regenerate some cassettes

812dc5d

Yun-Kim mentioned this pull request Jul 18, 2024

fix(langchain): use correct class names for pinecone vectorstore check #9759

Merged

16 tasks

Merge branch 'main' into yunkim/langchain-fix-flaky

785e252

Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from cc78aad to 489597a Compare July 25, 2024 20:24

Merge branch 'main' into yunkim/langchain-fix-flaky

1b35dc4

Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from 489597a to 1b35dc4 Compare July 25, 2024 20:33

Yun-Kim added 5 commits July 25, 2024 17:12

Remove flaky

232b8f7

Fix langchain version check in test_langchain_llmobs.py

1bde40d

Investigate sys.version_info issue on 3.9

bd81aff

Fix bedrock trace struct tests

e74c9f8

Use langchain and py39 version gating correctly

d35e50e

Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from 5e0023c to d35e50e Compare July 27, 2024 02:19

Revert commenting openai test api key

6ea5288

Yun-Kim mentioned this pull request Jul 27, 2024

feat(llmobs): submit langchain embedding spans #9850

Merged

2 tasks

Yun-Kim changed the title ~~wip(langchain): fix flaky cassette tests~~ chore(langchain): fix flaky cassette tests, skipping logic Jul 27, 2024

Yun-Kim added 3 commits July 26, 2024 22:44

Remove streaming tests, fix skip check

90fa6d0

Regenerate remaining cassettes (minus pinecone) with pinned vcrpy

13dfb24

Fix fmt, test anthropic api key

2d47347

Yun-Kim added 7 commits July 27, 2024 12:04

Merge branch 'main' into yunkim/langchain-fix-flaky

77f6b4c

Regenerate lcel tool cassettes with pinned vcrpy

da22b73

Regenerate pinecone test cassette

d4a3e7b

Skip langchain 0.1.20 tests for ai21 and cohere, pin vcrpy to 6.0.1 f…

da574bc

…or langchain v0 tests

Update riot lockfiles

fc9277a

Revert pytest-asyncio dep to main branch

43af5a1

Change py39 skip marker

c061dce

Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from 1ddb905 to 128dd49 Compare July 29, 2024 17:13

Remove skip marker for langchain community tmp

2c5b38c

Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from 128dd49 to 2c5b38c Compare July 29, 2024 17:31

Skip ai21 tests for python3.9

6ef4e9a

Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from e7c7e0a to 6ef4e9a Compare July 29, 2024 18:55

Fix snapshot test

69cf692

Yun-Kim marked this pull request as ready for review July 29, 2024 20:39

Yun-Kim requested review from a team as code owners July 29, 2024 20:39

Yun-Kim requested review from ZStriker19 and lievan July 29, 2024 20:39

Yun-Kim added 2 commits July 29, 2024 16:41

Merge branch 'main' into yunkim/langchain-fix-flaky

3b416e1

Merge branch 'main' into yunkim/langchain-fix-flaky

a1503ab

sabrenner approved these changes Jul 30, 2024

View reviewed changes

lievan approved these changes Jul 30, 2024

View reviewed changes

Yun-Kim enabled auto-merge (squash) July 30, 2024 17:51

Yun-Kim merged commit 7934297 into main Jul 30, 2024
153 checks passed

Yun-Kim deleted the yunkim/langchain-fix-flaky branch July 30, 2024 17:58

Yun-Kim mentioned this pull request Jul 30, 2024

chore(langchain): make langchain version skip conditional less precise for CI #9997

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(langchain): fix flaky cassette tests, skipping logic #9768

chore(langchain): fix flaky cassette tests, skipping logic #9768

Yun-Kim commented Jul 9, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Jul 9, 2024 •

edited

Loading

pr-commenter bot commented Jul 9, 2024 •

edited

Loading

codecov-commenter commented Jul 11, 2024

github-actions bot commented Jul 17, 2024 •

edited

Loading

chore(langchain): fix flaky cassette tests, skipping logic #9768

chore(langchain): fix flaky cassette tests, skipping logic #9768

Conversation

Yun-Kim commented Jul 9, 2024 • edited Loading

Checklist

Reviewer Checklist

datadog-dd-trace-py-rkomorn bot commented Jul 9, 2024 • edited Loading

Datadog Report

pr-commenter bot commented Jul 9, 2024 • edited Loading

Benchmarks

codecov-commenter commented Jul 11, 2024

Codecov Report

github-actions bot commented Jul 17, 2024 • edited Loading

Yun-Kim commented Jul 9, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Jul 9, 2024 •

edited

Loading

pr-commenter bot commented Jul 9, 2024 •

edited

Loading

github-actions bot commented Jul 17, 2024 •

edited

Loading