Skip to content

Commit

Permalink
fix: implemented requested code review changes in bleu.py
Browse files Browse the repository at this point in the history
  • Loading branch information
kadamrahul18 committed Jan 11, 2025
1 parent 69b57e5 commit 84b7669
Show file tree
Hide file tree
Showing 5 changed files with 288 additions and 262 deletions.
43 changes: 6 additions & 37 deletions apps/opik-documentation/documentation/docs/cookbook/dspy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -37,17 +37,9 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"OPIK: Opik is already configured. You can check the settings by viewing the config file at /Users/jacquesverre/.opik.config\n"
]
}
],
"outputs": [],
"source": [
"import opik\n",
"\n",
Expand All @@ -56,7 +48,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -78,7 +70,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -95,31 +87,9 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:langfuse:Langfuse client is disabled since no public_key was provided as a parameter or environment variable 'LANGFUSE_PUBLIC_KEY'. See our docs: https://langfuse.com/docs/sdk/python/low-level-sdk#initialize-client\n",
"OPIK: Started logging traces to the \"DSPY\" project at https://www.comet.com/opik/jacques-comet/redirect/projects?name=DSPY.\n"
]
},
{
"data": {
"text/plain": [
"Prediction(\n",
" reasoning='The meaning of life is a philosophical question that has been contemplated by humans for centuries. Different cultures, religions, and individuals have proposed various interpretations. Some suggest that the meaning of life is to seek happiness, fulfillment, and personal growth, while others believe it is about serving a higher purpose or contributing to the well-being of others. Ultimately, the meaning of life may vary from person to person, shaped by personal experiences, beliefs, and values.',\n",
" answer=\"The meaning of life is subjective and can vary greatly among individuals. It may involve seeking happiness, personal growth, and contributing to the well-being of others, or fulfilling a higher purpose, depending on one's beliefs and experiences.\"\n",
")"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"cot = dspy.ChainOfThought(\"question -> answer\")\n",
"cot(question=\"What is the meaning of life?\")"
Expand Down Expand Up @@ -157,4 +127,3 @@
"nbformat": 4,
"nbformat_minor": 4
}

Original file line number Diff line number Diff line change
Expand Up @@ -101,43 +101,61 @@ print(score)

### BLEU

The `BLEU` metric can be used to check if the output of an LLM is a valid translation of a reference text. `score()` computes the sentence-level BLEU score for a single candidate against one or more reference translations. It can be used in the following way:
The BLEU metric calculates how close the LLM output is to one or more reference translations. This single metric class can compute:
- Single-sentence BLEU: Pass a single output string and one or more reference strings.
- Corpus-level BLEU: Pass a list of output strings and a parallel list of reference strings (or lists of references).

Single-Sentence BLEU

```python
from opik.evaluation.metrics import BLEU

metric = BLEU()
bleu_metric = BLEU()

score = metric.score(output="Hello world!", reference="Hello world")
print(score)
score = bleu_metric.score(
output="Hello world!",
reference="Hello world"
)
print(score.value, score.reason)

score = bleu_metric.score(
output="Hello world!",
reference=["Hello planet", "Hello world"]
)
print(score.value, score.reason)
```

You can also configure the `BLEU` metric when instantiating it:
Corpus-Level BLEU

```python
from opik.evaluation.metrics import BLEU

metric = BLEU(n_grams=4, smoothing_method="method1", epsilon=0.1, alpha=5.0, k=5.0)
bleu_metric = BLEU()

score = metric.score(output="Hello world !", reference="Hello world")
print(score)
outputs = ["Hello there", "This is a test."]
references = [
["Hello world", "Hello there"],
"This is a test."
]

result = bleu_metric.score(output=outputs, reference=references)
print(result.value, result.reason)
```

`score_corpus()` computes the corpus-level BLEU score for multiple candidate sentences and their corresponding references. It can be used in the following way:
You can also customize n-grams, smoothing methods, or weights:

```python
from opik.evaluation.metrics import BLEU

bleu_metric = BLEU()

outputs = ["This is a test.", "Another test sentence."]

references_list = [
["This is a test.", "This is also a test."],
["Another test sentence.", "Yet another test sentence."],
]

result = bleu_metric.score_corpus(outputs, references_list)

print(f"Corpus BLEU score: {result.value:.4f}, Reason: {result.reason}")
metric = BLEU(
n_grams=4,
smoothing_method="method1",
weights=[0.25, 0.25, 0.25, 0.25]
)

score = metric.score(
output="The cat sat on the mat",
reference=["The cat is on the mat", "A cat sat here on the mat"]
)
print(score.value, score.reason)
```
Loading

0 comments on commit 84b7669

Please sign in to comment.