Skip to content

Commit

Permalink
NO-JIRA: Automate and update Open API yaml for the documentation app (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
andrescrz authored Nov 22, 2024
1 parent 399b5cc commit 111267a
Show file tree
Hide file tree
Showing 11 changed files with 407 additions and 93 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ toc_max_heading_level: 4

# Custom Metric

Opik allows you to define your own metrics. This is useful if you have a specific metric that is not already implemented.
Opik allows you to define your own metrics. This is useful if you have a specific metric that is not already implemented.

If you want to write an LLM as a Judge metric, you can use either the [G-Eval metric](/evaluation/metrics/g_eval.md) or create your own from scratch.

Expand Down Expand Up @@ -60,7 +60,6 @@ You can also return a list of `ScoreResult` objects as part of your custom metri

This metric can now be used in the `evaluate` function as explained here: [Evaluating LLMs](/evaluation/evaluate_your_llm.md).


#### Example: Creating a metric with OpenAI model

You can implement your own custom metric by creating a class that subclasses the `BaseMetric` class and implements the `score` method.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ sidebar_label: G-Eval
G-Eval is a task agnostic LLM as a Judge metric that allows you to specify a set of criteria for your metric and it will use a Chain of Thought prompting technique to create some evaluation steps and return a score. You can learn more about G-Eval in the [original paper](https://arxiv.org/abs/2303.16634).

To use G-Eval, you need to specify just two pieces of information:

1. A task introduction: This describes the task you want to evaluate
2. Evaluation criteria: This is a list of criteria that the LLM will use to evaluate the task.

Expand All @@ -31,7 +32,6 @@ The way the G-Eval metric works is by first using the task introduction and eval

By default, the `gpt-4o` model is used to generate the final score but you can change this to any model supported by [LiteLLM](https://docs.litellm.ai/docs/providers) by setting the `model_name` parameter.


The evaluation steps are generated using the following prompt:

```
Expand All @@ -54,6 +54,7 @@ SCORE VALUE MUST BE AN INTEGER.
```

The final score is generated by combining the evaluation steps returned by the prompt above with the task introduction and evaluation criteria:

```
*** TASK INTRODUCTION:
{task_introduction}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,14 @@ Opik provides the following built-in evaluation metrics:
| IsJson | Heuristic | Checks if the output is a valid JSON object | [IsJson](/evaluation/metrics/heuristic_metrics#isjson) |
| Levenshtein | Heuristic | Calculates the Levenshtein distance between the output and an expected string | [Levenshtein](/evaluation/metrics/heuristic_metrics#levenshteinratio) |
| Hallucination | LLM as a Judge | Check if the output contains any hallucinations | [Hallucination](/evaluation/metrics/hallucination) |
| G-Eval | LLM as a Judge | Task agnostic LLM as a Judge metric | [G-Eval](/evaluation/metrics/g_eval) |
| G-Eval | LLM as a Judge | Task agnostic LLM as a Judge metric | [G-Eval](/evaluation/metrics/g_eval) |
| Moderation | LLM as a Judge | Check if the output contains any harmful content | [Moderation](/evaluation/metrics/moderation) |
| AnswerRelevance | LLM as a Judge | Check if the output is relevant to the question | [AnswerRelevance](/evaluation/metrics/answer_relevance) |
| ContextRecall | LLM as a Judge | Check if the output contains any hallucinations | [ContextRecall](/evaluation/metrics/context_recall) |
| ContextPrecision | LLM as a Judge | Check if the output contains any hallucinations | [ContextPrecision](/evaluation/metrics/context_precision) |

You can also create your own custom metric, learn more about it in the [Custom Metric](/evaluation/metrics/custom_metric) section.


## Customizing LLM as a Judge metrics

By default, Opik uses GPT-4o from OpenAI as the LLM to evaluate the output of other LLMs. However, you can easily switch to another LLM provider by specifying a different `model` in the `model_name` parameter of each LLM as a Judge metric.
Expand Down
1 change: 0 additions & 1 deletion apps/opik-documentation/documentation/docs/home.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ Evaluating the output of your LLM calls is critical to ensure that your applicat
2. [Store evaluation datasets](/evaluation/manage_datasets.md) in the platform and [run evaluations](/evaluation/evaluate_your_llm.md)
3. Use our [pytest integration](/testing/pytest_integration.md) to track unit test results and compare results between runs


## Getting Started

[Comet](https://www.comet.com/site) provides a managed Cloud offering for Opik, simply [create an account](https://www.comet.com/signup?from=llm) to get started.
Expand Down
7 changes: 4 additions & 3 deletions apps/opik-documentation/documentation/docs/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@ If you are self-hosting the platform, simply use the `opik configure --use_local

## Adding Opik observability to your codebase


### Logging LLM calls

The first step in integrating Opik with your codebase is to track your LLM calls. If you are using OpenAI or any LLM provider that is supported by LiteLLM, you
Expand Down Expand Up @@ -87,6 +86,7 @@ litellm.callbacks = [opik_logger]
```

All LiteLLM calls made using the `litellm` client will now be logged to Opik.

</TabItem>

<TabItem value="Any other provider" title="Any other provider">
Expand All @@ -109,6 +109,7 @@ call_llm(client, [{"role": "user", "content": "Why is tracking and evaluation of
The `@track` decorator will automatically log the input and output of the decorated function allowing you to track the user
messages and the LLM responses in Opik. If you want to log more than just the input and output, you can use the `update_current_span` function
as described in the [Traces / Logging Additional Data section](/tracing/log_traces.mdx#logging-additional-data).

</TabItem>

</Tabs>
Expand All @@ -121,15 +122,13 @@ or by writing custom python code.

Opik makes it easy for your to log your chains no matter how you implement them:


<Tabs>
<TabItem value="Custom Python Code" title="Custom Python Code">

If you are not using any frameworks to build your chains, you can use the `@track` decorator to log your chains. When a
function is decorated with `@track`, the input and output of the function will be logged to Opik. This works well even for very
nested chains:


```python
from opik.integrations.openai import track_openai, track
from openai import OpenAI
Expand Down Expand Up @@ -204,6 +203,7 @@ llm_chain = LLMChain(llm=llm, prompt=prompt_template)
# Generate the translations
llm_chain.run("Hello, how are you?", callbacks=[opik_tracer])
```

</TabItem>

<TabItem value="LLamaIndex" title="LLamaIndex">
Expand All @@ -218,6 +218,7 @@ opik_callback_handler = global_handler
```

You LlamaIndex calls from that point forward will be logged to Opik. You can learn more about the LlamaIndex integration in the [LLamaIndex integration docs](/tracing/integrations/llama_index.md).

</TabItem>

</Tabs>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@ If you’re unable to use our LiteLLM integration with watsonx, please [open an

:::


### Configuring watsonx

In order to configure watsonx, you will need to have:

- The endpoint URL: Documentation for this parameter can be found [here](https://cloud.ibm.com/apidocs/watsonx-ai#endpoint-url)
- Watsonx API Key: Documentation for this parameter can be found [here](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui)
- Watsonx Token: Documentation for this parameter can be found [here](https://cloud.ibm.com/docs/account?topic=account-iamtoken_from_apikey#iamtoken_from_apikey)
Expand All @@ -49,7 +49,7 @@ Once you have these, you can set them as environment variables:
```python
import os

os.environ["WATSONX_ENDPOINT_URL"] = "" # Base URL of your WatsonX instance
os.environ["WATSONX_ENDPOINT_URL"] = "" # Base URL of your WatsonX instance
os.environ["WATSONX_API_KEY"] = "" # IBM cloud API key
os.environ["WATSONX_TOKEN"] = "" # IAM auth token

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ opik configure

</TabItem>
<TabItem value="Self-hosting" title="Self-hosting">

If you are self-hosting the platform, you can configure the SDK by running:

```python
Expand All @@ -46,13 +47,15 @@ opik configure --use_local
```

</TabItem>

</Tabs>

The `configure` methods will prompt you for the necessary information and save it to a configuration file (`~/.opik.config`).

## Advanced usage

In addition to the `configure` method, you can also configure the Python SDK in a couple of different ways:

1. Using a configuration file
2. Using environment variables

Expand All @@ -74,14 +77,16 @@ api_key = <API Key>

</TabItem>
<TabItem value="Self-hosting" title="Self-hosting">

```toml
[opik]
url_override = http://localhost:5173/api
workspace = default
```

</TabItem>
</Tabs>

</Tabs>

You can find a full list of the the configuration options in the [Configuration values section](/tracing/sdk_configuration#configuration-values) below.

Expand Down
Loading

0 comments on commit 111267a

Please sign in to comment.