Evaluation engine criteria causes errors when intermediate evaluation_steps generation returns poorly formatted JSON/dict #1578

griptapeOsipa · 2025-01-14T22:52:54Z

Description:

When "criteria" is used in the evaluation engine, during the automagic generation of evaluation_steps, the intermediate JSON/dict is sometimes improperly formatted, leading to runtime errors.

Steps to Reproduce:
Running this as-is will work (sending something to the evaluation steps directly). Commenting "evaluation_steps" out, and "criteria" in, shows the problem:

from griptape.structures import Pipeline
from griptape.engines import EvalEngine
from griptape.tasks import PromptTask
from griptape.rules import Rule

from dotenv import load_dotenv
load_dotenv() # Load the environment variables

rules    =  [
                "Answer with a json object, with no additional markup.",
                "Talk like a pirate.",
            ]

pipeline = Pipeline(
    tasks = [
        PromptTask(
            "Respond to this user: '{{ args[0] }}'"
            "{% if args[1] %}Use this feedback when answering.{{ args[1] }}{% endif %}"
        ),
    ],
    rules=[ Rule( rule ) for rule in rules ],
)

engine = EvalEngine(
    #criteria=[
    evaluation_steps=[
        f"Determine whether the following rules have been met: {rules}",
    ]
)

pipeline.run( "Hi there" )
score, reason = engine.evaluate(
    input         = pipeline.tasks[0].input.value,
    actual_output = pipeline.output.value,
)

Expected Behavior: The intermediate JSON/dict for evaluation_steps should always include a steps key, an be properly formatted for use.

Environment:

Griptape version: 1.1.1
Python version: 3.10-3.12
OS: OSX

Debug text pulled to look at the json directly

('{"type": "object", "properties": {"steps": {"type": "array", "items": '
 '{"type": "string"}}}, "required": ["steps"], "additionalProperties": false, '
 '"$id": "Output Format", "$schema": '
 '"http://json-schema.org/draft-07/schema#"}')

How the error reads:

Traceback (most recent call last):
  File ".../GitHub/griptape-TS-1_1/evalEngine.py", line 33, in <module>
    score, reason = engine.evaluate(
  File ".../GitHub/griptape-TS-1_1/.venv/lib/python3.10/site-packages/griptape/engines/eval/eval_engine.py", line 86, in evaluate
    self.evaluation_steps = self._generate_steps(evaluation_params)
  File ".../GitHub/griptape-TS-1_1/.venv/lib/python3.10/site-packages/griptape/engines/eval/eval_engine.py", line 115, in _generate_steps
    return parsed_result["steps"]
KeyError: 'steps'

The text was updated successfully, but these errors were encountered:

collindutter · 2025-01-14T22:57:04Z

Fixed via #1519

griptapeOsipa assigned collindutter Jan 14, 2025

collindutter mentioned this issue Jan 14, 2025

Use structured output when generating evaluation steps #1519

Merged

1 task

collindutter closed this as completed Jan 14, 2025

collindutter added this to the 1.2 milestone Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation engine criteria causes errors when intermediate evaluation_steps generation returns poorly formatted JSON/dict #1578

Evaluation engine criteria causes errors when intermediate evaluation_steps generation returns poorly formatted JSON/dict #1578

griptapeOsipa commented Jan 14, 2025

collindutter commented Jan 14, 2025

Evaluation engine criteria causes errors when intermediate evaluation_steps generation returns poorly formatted JSON/dict #1578

Evaluation engine criteria causes errors when intermediate evaluation_steps generation returns poorly formatted JSON/dict #1578

Comments

griptapeOsipa commented Jan 14, 2025

collindutter commented Jan 14, 2025