TypeError while evaluating a test case for Role Adherence metric using CustomLlama3_8B. #1105

Qca-2023 · 2024-10-21T10:09:46Z

Describe the bug
A TypeError occurs when I try to evaluate a test_case for Role Adherence metric using CustomLlama3_8B model from the documentation.

To Reproduce
Steps to reproduce the behavior:

Install the necessary libraries.

!pip install deepeval --upgrade
!pip install -U bitsandbytes
!pip install accelerate
!pip install lm-format-enforcer
!git config --global credential.helper store

Use the code below to create the custom LLM.

import json
import transformers
import torch
from transformers import BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from pydantic import BaseModel
from lmformatenforcer import JsonSchemaParser
from lmformatenforcer.integrations.transformers import (
build_transformers_prefix_allowed_tokens_fn,
)

from deepeval.models import DeepEvalBaseLLM

class CustomLlama3_8B(DeepEvalBaseLLM):
def init(self):
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)

    model_4bit = AutoModelForCausalLM.from_pretrained(
        "meta-llama/Meta-Llama-3-8B-Instruct",
        device_map="auto",
        quantization_config=quantization_config,
    )
    tokenizer = AutoTokenizer.from_pretrained(
        "meta-llama/Meta-Llama-3-8B-Instruct"
    )

    self.model = model_4bit
    self.tokenizer = tokenizer

def load_model(self):
    return self.model

def generate(self, prompt: str, schema: BaseModel) -> BaseModel:
    # Same as the previous example above
    model = self.load_model()
    pipeline = transformers.pipeline(
        "text-generation",
        model=model,
        tokenizer=self.tokenizer,
        use_cache=True,
        device_map="auto",
        max_length=2500,
        do_sample=True,
        top_k=5,
        num_return_sequences=1,
        eos_token_id=self.tokenizer.eos_token_id,
        pad_token_id=self.tokenizer.eos_token_id,
    )

    # Create parser required for JSON confinement using lmformatenforcer
    parser = JsonSchemaParser(schema.schema())
    prefix_function = build_transformers_prefix_allowed_tokens_fn(
        pipeline.tokenizer, parser
    )

    # Output and load valid JSON
    output_dict = pipeline(prompt, prefix_allowed_tokens_fn=prefix_function)
    output = output_dict[0]["generated_text"][len(prompt) :]
    json_result = json.loads(output)

    # Return valid JSON object according to the schema DeepEval supplied
    return schema(**json_result)

async def a_generate(self, prompt: str, schema: BaseModel) -> BaseModel:
    return self.generate(prompt, schema)

def get_model_name(self):
    return "Llama-3 8B"

Login into your hugging face using api_key.
!huggingface-cli login
Create an instance of your custom_llm.

custom_llm = CustomLlama3_8B()

Evaluate the test case using the Role Adherence metric

from deepeval.metrics import RoleAdherenceMetric
convo_test_case = ConversationalTestCase(
chatbot_role="You are a human marketing virtual assistant for question-answering tasks.",
turns=[LLMTestCase(input="What services does BizTech Analytics provide?", actual_output="BizTech Analytics provides tailored AI and data services designed to fit an organization’s needs, driving efficiency, growth, and innovation through deep industry insights and advanced technologies.")]
)
metric = RoleAdherenceMetric(threshold=0.5,model=custom_llm)

metric.measure(convo_test_case)
print(metric.score)
print(metric.reason)

Expected behavior
The result should be the score and reason output from the metric.

Screenshots

Desktop (please complete the following information):

OS: Windows 11
Browser: Chrome
Version: 129.0.6668.101 (Official Build) (64-bit) (cohort: Stable)

Additional context
Ran all of this in Google colab. I have used all of this code from the deepeval documentation.
https://docs.confident-ai.com/docs/guides-using-custom-llms

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError while evaluating a test case for Role Adherence metric using CustomLlama3_8B. #1105

TypeError while evaluating a test case for Role Adherence metric using CustomLlama3_8B. #1105

Qca-2023 commented Oct 21, 2024 •

edited

Loading

TypeError while evaluating a test case for Role Adherence metric using CustomLlama3_8B. #1105

TypeError while evaluating a test case for Role Adherence metric using CustomLlama3_8B. #1105

Comments

Qca-2023 commented Oct 21, 2024 • edited Loading

Qca-2023 commented Oct 21, 2024 •

edited

Loading