Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(langchain): fix flaky cassette tests, skipping logic #9768

Merged
merged 29 commits into from
Jul 30, 2024

Conversation

Yun-Kim
Copy link
Contributor

@Yun-Kim Yun-Kim commented Jul 9, 2024

This PR attempts to address the flakiness and general (lack of) reliability of the LangChain test suite, and removes the flaky test markers from most langchain tests. No functionality has been changed, this PR focuses only on improving our test suite to be less flaky and be a better signal of our langchain library coverage.

Problems with LangChain test suite:

  1. Tests are randomly flaking due to cassettes being read incorrectly (See related issue)
  2. Tests are completely skipped except for patch tests. This seems like something to do with our skipping logic by looking at langchain versions and Python versions (we skip a few test cases in Python 3.9 due to unnecessary cassette files that are required specifically for 3.9).

This PR solves the above problems by:

  1. Pinning vcrpy to version 5.1.0 which is the version prior to the linked issue being introduced
  2. Rewriting the langchain version skipping logic to rely specifically on the Langchain module version (instead of reusing the PATCH_LANGCHAIN_V0 constant from the patch file which appeared to not be truthful)
  3. Change the Python version skip checks to only use major/minor versions rather than checking against (3, 10, 0) as this may result in some edge cases. Do not skip Python 3.9 testing on test_langchain_community.py or community tests on test_langchain_llmobs.py.

We have made some fuzzy approaches to handling Python 3.9 and langchain v0:

  • Python 3.9: Runs all tests for the most part, except for a handful of langchain v0 tests and one langchain v1 test due to the cassette file being different across python 3.9 and python 3.10+. To avoid generating so many test cassette files, we have skipped these handful of tests only for Python 3.9.
  • LangChain v0 tests: We have two sets of tests for langchain (v0) and langchain_community (v1+), and only run those tests for respective langchain version environments.

Checklist

  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@Yun-Kim Yun-Kim added the changelog/no-changelog A changelog entry is not required for this PR. label Jul 9, 2024
@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from ddcf4a1 to c139f43 Compare July 9, 2024 17:53
@datadog-dd-trace-py-rkomorn
Copy link

datadog-dd-trace-py-rkomorn bot commented Jul 9, 2024

Datadog Report

Branch report: yunkim/langchain-fix-flaky
Commit report: a1503ab
Test service: dd-trace-py

✅ 0 Failed, 137477 Passed, 40702 Skipped, 7h 46m 59.24s Total duration (3m 2.1s time saved)

@pr-commenter
Copy link

pr-commenter bot commented Jul 9, 2024

Benchmarks

Benchmark execution time: 2024-07-29 21:54:17

Comparing candidate commit 3b416e1 in PR branch yunkim/langchain-fix-flaky with baseline commit d1db200 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 214 metrics, 2 unstable metrics.

@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from 33f1650 to f6d7302 Compare July 10, 2024 21:57
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 0% with 45 lines in your changes missing coverage. Please review.

Project coverage is 10.42%. Comparing base (2022a3b) to head (d16172a).
Report is 17 commits behind head on main.

Files Patch % Lines
...ests/contrib/langchain/test_langchain_community.py 0.00% 29 Missing ⚠️
tests/contrib/langchain/test_langchain.py 0.00% 15 Missing ⚠️
ddtrace/contrib/langchain/patch.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #9768       +/-   ##
===========================================
- Coverage   74.48%   10.42%   -64.07%     
===========================================
  Files        1391     1358       -33     
  Lines      128824   126595     -2229     
===========================================
- Hits        95959    13193    -82766     
- Misses      32865   113402    +80537     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Jul 17, 2024

CODEOWNERS have been resolved as:

riotfile.py                                                             @DataDog/apm-python
tests/contrib/langchain/cassettes/langchain_community/ai21_completion_sync.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/anthropic_chat_completion_sync.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/cohere_completion_sync.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_openai_chain_acall.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_openai_chain_batch.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_openai_chain_call.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_openai_chain_call_complicated.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_openai_chain_nested.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_openai_chain_schema_io.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_with_tools_anthropic.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/lcel_with_tools_openai.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_chat_completion_async_call.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_chat_completion_async_generate.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_chat_completion_image_input_sync_generate.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_chat_completion_sync_call.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_chat_completion_sync_generate.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_completion_async.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_completion_error.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_completion_sync.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_completion_sync_multi_prompt.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_paraphrase.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_pinecone_vectorstore_retrieval_chain.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_sequential_paraphrase_and_rhyme_async.yaml  @DataDog/ml-observability
tests/contrib/langchain/cassettes/langchain_community/openai_sequential_paraphrase_and_rhyme_sync.yaml  @DataDog/ml-observability
tests/contrib/langchain/test_langchain.py                               @DataDog/ml-observability
tests/contrib/langchain/test_langchain_community.py                     @DataDog/ml-observability
tests/contrib/langchain/test_langchain_llmobs.py                        @DataDog/ml-observability
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_ai21_llm_sync.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_chain_invoke.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_cohere_llm_sync.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_lcel_chain_complicated.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_lcel_chain_nested.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_lcel_chain_simple_async.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_lcel_with_tools_anthropic.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_chat_model_async_call.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_chat_model_async_generate.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_chat_model_sync_call_langchain_openai.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_chat_model_sync_generate.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_chat_model_vision_generate.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_integration.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_llm_sync_multiple_prompts.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_sequential_chain.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_sequential_chain_with_multiple_llm_sync.json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_service_name[None-None].json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_service_name[None-v0].json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_service_name[None-v1].json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_service_name[mysvc-None].json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_service_name[mysvc-v0].json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_service_name[mysvc-v1].json  @DataDog/apm-python
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_pinecone_vectorstore_retrieval_chain.json  @DataDog/apm-python
.riot/requirements/11063bf.txt                                          @DataDog/apm-python
.riot/requirements/16c3b9f.txt                                          @DataDog/apm-python
.riot/requirements/1761cfc.txt                                          @DataDog/apm-python
.riot/requirements/18bc2ac.txt                                          @DataDog/apm-python
.riot/requirements/19f2225.txt                                          @DataDog/apm-python
.riot/requirements/1ec1dbf.txt                                          @DataDog/apm-python
.riot/requirements/457db9b.txt                                          @DataDog/apm-python
.riot/requirements/55a4977.txt                                          @DataDog/apm-python
.riot/requirements/585e779.txt                                          @DataDog/apm-python
.riot/requirements/a311bc2.txt                                          @DataDog/apm-python
.riot/requirements/aa1fe5c.txt                                          @DataDog/apm-python
.riot/requirements/cbbb0eb.txt                                          @DataDog/apm-python
.riot/requirements/cf9bdda.txt                                          @DataDog/apm-python
.riot/requirements/d39d3de.txt                                          @DataDog/apm-python
tests/contrib/langchain/cassettes/langchain_community/openai_math_chain.yaml  @DataDog/ml-observability
tests/snapshots/tests.contrib.langchain.test_langchain_community.test_openai_math_chain.json  @DataDog/apm-python

@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from cc78aad to 489597a Compare July 25, 2024 20:24
@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from 489597a to 1b35dc4 Compare July 25, 2024 20:33
@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from 5e0023c to d35e50e Compare July 27, 2024 02:19
@Yun-Kim Yun-Kim changed the title wip(langchain): fix flaky cassette tests chore(langchain): fix flaky cassette tests, skipping logic Jul 27, 2024
@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from 1ddb905 to 128dd49 Compare July 29, 2024 17:13
@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from 128dd49 to 2c5b38c Compare July 29, 2024 17:31
@Yun-Kim Yun-Kim force-pushed the yunkim/langchain-fix-flaky branch from e7c7e0a to 6ef4e9a Compare July 29, 2024 18:55
@Yun-Kim Yun-Kim marked this pull request as ready for review July 29, 2024 20:39
@Yun-Kim Yun-Kim requested review from a team as code owners July 29, 2024 20:39
@Yun-Kim Yun-Kim requested review from ZStriker19 and lievan July 29, 2024 20:39
@Yun-Kim Yun-Kim enabled auto-merge (squash) July 30, 2024 17:51
@Yun-Kim Yun-Kim merged commit 7934297 into main Jul 30, 2024
153 checks passed
@Yun-Kim Yun-Kim deleted the yunkim/langchain-fix-flaky branch July 30, 2024 17:58
Yun-Kim added a commit that referenced this pull request Jul 31, 2024
…e for CI (#9997)

Even after #9768 we're still seeing some tests be skipped on CI,
specifically `langchain-community` tests even when `langchain>=0.1`. My
suspicion is that the `sys.version_info < (0, 1, 0)` conditional that is
used for the test skipping is too precise for the versions of Langchain
being run on CI (not exactly sure what the issue is, but I've narrowed
it down to that area). This PR only checks the major and minor version
of Langchain being run

## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog A changelog entry is not required for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants