Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ref #172

Merged
merged 2 commits into from
Apr 15, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions docs/evaluation/faq/unit-testing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ from my_app.main import generate_sql
def test_sql_generation_select_all():
user_query = "Get all users from the customers table"
sql = generate_sql(user_query)
# LangSmith logs any exception raised by `assert` / `pytest.fail` / `raise` / etc.
# as a test failure
# highlight-next-line
assert sql == "SELECT * FROM customers"
```

Expand Down Expand Up @@ -181,3 +184,141 @@ With caching enabled, you can iterate quickly on your tests using `watch` mode w
pip install pytest-watch
LANGCHAIN_TEST_CACHE=tests/cassettes ptw tests/my_llm_tests
```

## Explanations

The `@unit` test decorator converts any unit test into a parametrized LangSmith example. By default, all unit tests within a given file will be grouped as a single "test suite" with a corresponding dataset.

The following metrics are available off-the-shelf:

| Feedback | Description | Example |
| -------------------- | ----------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
| `pass` | Binary pass/fail score, 1 for pass, 0 for fail | `assert False` # Fails |
| `expectation` | Binary expectation score, 1 if expectation is met, 0 if not | `expect(prediction).against(lambda x: re.search(r"\b[a-f\d]{8}-[a-f\d]{4}-[a-f\d]{4}-[a-f\d]{4}-[a-f\d]{12}\b", x)` ) |
| `embedding_distance` | Cosine distance between two embeddings | expect.embedding_distance(prediction=prediction, expectation=expectation) |
| `edit_distance` | Edit distance between two strings | expect.edit_distance(prediction=prediction, expectation=expectation) |

You can also log any arbitrary feeback within a unit test manually using the `client`.

```python
from langsmith import unit, Client
from langsmith.run_helpers import get_current_run_tree

client = Client()

@unit
def test_foo():
run_tree = get_current_run_tree()
client.create_feedback(run_id=run_tree.id, key="my_custom_feedback", score=1)
```

## Reference

### `expect`

`expect` makes it easy to make approximate assertions on test results and log scores to LangSmith.
Off-the-shelf, it allows you to compute and compare embedding distances, edit distances, and make custom assertions on values.

#### `expect.embedding_distance(prediction, reference, *, config=None)`

Compute the embedding distance between the prediction and reference.

This logs the embedding distance to LangSmith and returns a [`Matcher`](#matcher) instance for making assertions on the distance value.

By default, this uses the OpenAI API for computing embeddings.

**Parameters**

- `prediction` (str): The predicted string to compare.
- `reference` (str): The reference string to compare against.
- `config` (Optional[EmbeddingConfig]): Optional configuration for the embedding distance evaluator. Supported options:
- `encoder`: A custom encoder function to encode the list of input strings to embeddings. Defaults to the OpenAI API.
- `metric`: The distance metric to use for comparison. Supported values: "cosine", "euclidean", "manhattan", "chebyshev", "hamming".

**Returns**

A [`Matcher`](#matcher) instance for the embedding distance value.

#### `expect.edit_distance(prediction, reference, *, config=None)`

Compute the string distance between the prediction and reference.

This logs the string distance (Damerau-Levenshtein) to LangSmith and returns a [`Matcher`](#matcher) instance for making assertions on the distance value.

This depends on the `rapidfuzz` package for string distance computation.

**Parameters**

- `prediction` (str): The predicted string to compare.
- `reference` (str): The reference string to compare against.
- `config` (Optional[EditDistanceConfig]): Optional configuration for the string distance evaluator. Supported options:
- `metric`: The distance metric to use for comparison. Supported values: "damerau_levenshtein", "levenshtein", "jaro", "jaro_winkler", "hamming", "indel".
- `normalize_score`: Whether to normalize the score between 0 and 1.

**Returns**

A [`Matcher`](#matcher) instance for the string distance value.

#### `expect.value(value)`

Create a [`Matcher`](#matcher) instance for making assertions on the given value.

**Parameters**

- `value` (Any): The value to make assertions on.

**Returns**

A [`Matcher`](#matcher) instance for the given value.

#### `Matcher`

A class for making assertions on expectation values.

**`to_be_less_than(value)`**

Assert that the expectation value is less than the given value.

**`to_be_greater_than(value)` **

Assert that the expectation value is greater than the given value.

**`to_be_between(min_value, max_value)`**

Assert that the expectation value is between the given min and max values.

**`to_be_approximately(value, precision=2)`**

Assert that the expectation value is approximately equal to the given value.

**`to_equal(value)`**

Assert that the expectation value equals the given value.

**`to_contain(value)`**

Assert that the expectation value contains the given value.

**`against(func)`**

Assert the expectation value against a custom function.

### `unit` API

The `unit` decorator is used to mark a function as a test case for LangSmith. It ensures that the necessary example data is created and associated with the test function. The decorated function will be executed as a test case, and the results will be recorded and reported by LangSmith.

#### `@unit(id=None, output_keys=None, client=None, test_suite_name=None)`

Create a unit test case in LangSmith.

**Parameters**

- `id` (Optional[uuid.UUID]): A unique identifier for the test case. If not provided, an ID will be generated based on the test function's module and name.
- `output_keys` (Optional[Sequence[str]]): A list of keys to be considered as the output keys for the test case. These keys will be extracted from the test function's inputs and stored as the expected outputs.
- `client` (Optional[ls_client.Client]): An instance of the LangSmith client to be used for communication with the LangSmith service. If not provided, a default client will be used.
- `test_suite_name` (Optional[str]): The name of the test suite to which the test case belongs. If not provided, the test suite name will be determined based on the environment or the package name.

**Environment Variables**

- `LANGSMITH_TEST_CACHE`: If set, API calls will be cached to disk to save time and costs during testing. Recommended to commit the cache files to your repository for faster CI/CD runs. Requires the 'langsmith[vcr]' package to be installed.
- `LANGSMITH_TEST_TRACKING`: Set this variable to the path of a directory to enable caching of test results. This is useful for re-running tests without re-executing the code. Requires the 'langsmith[vcr]' package.
Loading