Large Rules Being Entirely Ignored #1466

collindutter · 2024-12-19T18:12:55Z

I have read and agree to the contributing guidelines.

Describe the bug
Users have reported that creating a single massive rule performs significantly worse compared to setting it directly in the system prompt. Splitting the rule up into multiple rules might improve performance, but is inconvenient for users.

To Reproduce

from griptape.rules import Rule
from griptape.structures import Agent
from griptape.tasks import PromptTask

artifact_rule = Rule(
    """
You are a helpful AI assistant that creates well-structured responses with artifacts for substantial content. 
        
        ARTIFACTS USAGE GUIDELINES:
        
        1. CREATE ARTIFACTS for:
           - Original creative, analytical and business writing (reports, data analysis, financial models, presentations) over 20 lines
           - In-depth analytical content (reviews, critiques, analyses) over 20 lines
           - Custom code solving specific problems
           - Technical documentation meant as reference material
           - Content intended for use outside conversation
           - Comprehensive guides or instructional content
           - Content that will be edited, expanded, or reused
           
        2. DO NOT USE ARTIFACTS for:
           - Explanatory content (explaining concepts, math problems, algorithms)
           - Teaching or demonstrating concepts (even with examples)
           - Answering questions about existing knowledge
           - Purely informational responses
           - Lists, rankings, or comparisons regardless of length
           - Plot summaries, basic reviews, or descriptions
           - Conversational responses and discussions
           - Advice or tips

        3. ARTIFACT FORMATTING:
           - Use <artifact type="code" language="[language]"> for code
           - Use <artifact type="markdown"> for documents and long-form text
           - Use <artifact type="html"> for HTML/web content
           - Use <artifact type="svg+xml"> for SVG graphics
           - Use <artifact type="mermaid"> for diagrams
           - Use <artifact type="react"> for React components

        4. GENERAL RULES:
           - Keep outputs over 20 lines in artifacts
           - Maintain conversational responses outside artifacts
           - Use artifacts only when clearly beneficial
           - Never mention or explain artifacts to users
           - Always close artifact tags properly
           - Place conversation or explanation outside artifacts
           - If in doubt, prefer NOT to use an artifact
           - One artifact per response unless specifically requested

        5. RESPONSE STRUCTURE:
           - Think through user request first
           - If artifact needed, generate content inside appropriate tags
           - Add conversational context/explanation outside artifact
           - Keep responses natural and helpful
           - Talk like a pirate
        
        Remember: Artifacts are for substantial, reusable content - not for regular conversation. When in doubt, err on the side of not using an artifact.
"""
)


agent = Agent()

agent.add_task(
    PromptTask(
        "Let's create a short story of american psycho for modern times",
        # generate_system_template=lambda _: artifact_rule.value,
        rules=[artifact_rule],
    )
)

agent.run()

Agent does not talk like a pirate. Uncomment generate_system_template and it does.
Expected behavior
Rules should be followed, regardless of size.

Additional context
Relevant thread

The text was updated successfully, but these errors were encountered:

collindutter · 2024-12-23T19:19:26Z

I'm struggling to reproduce a significant difference between a custom system prompt and rules. The original example shared does not output as an artifact even when using generate_system_template. Furthermore, if I add "Talk like a pirate", it does not follow unless I simplify the custom system prompt.

collindutter · 2025-01-10T18:00:00Z

I was not able to find any meaningful difference between the two techniques. I think the best we can do at this time is better explain how to write effective rules/override system prompts in #1535.

For future reference, this is how I evaluated it:

from griptape.configs import Defaults
from griptape.configs.drivers import OpenAiDriversConfig
from griptape.drivers import OpenAiChatPromptDriver
from griptape.engines import EvalEngine
from griptape.rules import Rule, Ruleset
from griptape.structures import Agent
from griptape.tasks import PromptTask

ARTIFACT_PROMPT = """
         You are a helpful AI assistant that creates well-structured responses with artifacts for substantial content.

        ARTIFACTS USAGE GUIDELINES:

        1. CREATE ARTIFACTS for:
           - Original creative, analytical and business writing (reports, data analysis, financial models, presentations) over 20 lines
           - In-depth analytical content (reviews, critiques, analyses) over 20 lines
           - Custom code solving specific problems
           - Technical documentation meant as reference material
           - Content intended for use outside conversation
           - Comprehensive guides or instructional content
           - Content that will be edited, expanded, or reused

        2. DO NOT USE ARTIFACTS for:
           - Explanatory content (explaining concepts, math problems, algorithms)
           - Teaching or demonstrating concepts (even with examples)
           - Answering questions about existing knowledge
           - Purely informational responses
           - Lists, rankings, or comparisons regardless of length
           - Plot summaries, basic reviews, or descriptions
           - Conversational responses and discussions
           - Advice or tips

        3. ARTIFACT FORMATTING:
           - Use <artifact type="code" language="[language]"> for code
           - Use <artifact type="markdown"> for documents and long-form text
           - Use <artifact type="html"> for HTML/web content
           - Use <artifact type="svg+xml"> for SVG graphics
           - Use <artifact type="mermaid"> for diagrams
           - Use <artifact type="react"> for React components

        4. GENERAL RULES:
           - Keep outputs over 20 lines in artifacts
           - Maintain conversational responses outside artifacts
           - Use artifacts only when clearly beneficial
           - Never mention or explain artifacts to users
           - Always close artifact tags properly
           - Place conversation or explanation outside artifacts
           - If in doubt, prefer NOT to use an artifact
           - One artifact per response unless specifically requested

        5. RESPONSE STRUCTURE:
           - Think through user request first
           - If artifact needed, generate content inside appropriate tags
           - Add conversational context/explanation outside artifact
           - Keep responses natural and helpful
        Remember: Artifacts are for substantial, reusable content - not for regular conversation. When in doubt, err on the side of not using an artifact.
        ALWAYS Talk like a pirate
"""

Defaults.drivers_config = OpenAiDriversConfig(
    prompt_driver=OpenAiChatPromptDriver(model="gpt-4o-mini")
)

ruleset = Ruleset(
    name="Pirate ruleset",
    rules=[
        Rule(f"{ARTIFACT_PROMPT}"),
        Rule(
            """You have to always respond in pirate.
            Also always start with a joke. Lastly, first word should always be "cherry" """
        ),
    ],
)

rule_agent = Agent(rulesets=[ruleset])

system_agent = Agent()
system_agent.add_task(
    PromptTask(
        generate_system_template=lambda _: f"{ARTIFACT_PROMPT}"
        + """You have to always respond in pirate.
            Also always start with a joke. Lastly, first word should always be 'cherry'"""
    )
)


eval_engine = EvalEngine(
    prompt_driver=OpenAiChatPromptDriver(model="gpt-4o"),
    evaluation_steps=[
        "Determine if the actual output is spoken like a pirate or pirate related.",
        "Determine if the actual output starts with a joke",
        "Determine if the actual output's first word is 'Cherry'",
    ],
)

for agent in [rule_agent, system_agent]:
    average_score = 0
    cycles = 10
    for _ in range(cycles):
        agent.run("Who are you")
        score, reason = eval_engine.evaluate(
            input=agent.input_task.input.value,
            actual_output=agent.output_task.output.value,
        )
        print(score, reason)
        average_score += score
    average_score /= cycles
    print("Agent score:", average_score)

collindutter added the type:bug Something isn't working label Dec 19, 2024

collindutter added this to the 2.0 milestone Dec 19, 2024

collindutter added status: needs triage and removed status: needs triage labels Dec 19, 2024

collindutter self-assigned this Dec 23, 2024

collindutter added the status: can't reproduce The issue could not be replicated label Dec 23, 2024

collindutter linked a pull request Jan 10, 2025 that will close this issue

Explain how to override system prompt #1535

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large Rules Being Entirely Ignored #1466

Large Rules Being Entirely Ignored #1466

collindutter commented Dec 19, 2024 •

edited

Loading

collindutter commented Dec 23, 2024

collindutter commented Jan 10, 2025

Large Rules Being Entirely Ignored #1466

Large Rules Being Entirely Ignored #1466

Comments

collindutter commented Dec 19, 2024 • edited Loading

collindutter commented Dec 23, 2024

collindutter commented Jan 10, 2025

collindutter commented Dec 19, 2024 •

edited

Loading