Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError while evaluating a test case for Role Adherence metric using CustomLlama3_8B. #1105

Open
Qca-2023 opened this issue Oct 21, 2024 · 0 comments

Comments

@Qca-2023
Copy link

Qca-2023 commented Oct 21, 2024

Describe the bug
A TypeError occurs when I try to evaluate a test_case for Role Adherence metric using CustomLlama3_8B model from the documentation.

To Reproduce
Steps to reproduce the behavior:

  1. Install the necessary libraries.

!pip install deepeval --upgrade
!pip install -U bitsandbytes
!pip install accelerate
!pip install lm-format-enforcer
!git config --global credential.helper store

  1. Use the code below to create the custom LLM.

import json
import transformers
import torch
from transformers import BitsAndBytesConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from pydantic import BaseModel
from lmformatenforcer import JsonSchemaParser
from lmformatenforcer.integrations.transformers import (
build_transformers_prefix_allowed_tokens_fn,
)

from deepeval.models import DeepEvalBaseLLM

class CustomLlama3_8B(DeepEvalBaseLLM):
def init(self):
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)

    model_4bit = AutoModelForCausalLM.from_pretrained(
        "meta-llama/Meta-Llama-3-8B-Instruct",
        device_map="auto",
        quantization_config=quantization_config,
    )
    tokenizer = AutoTokenizer.from_pretrained(
        "meta-llama/Meta-Llama-3-8B-Instruct"
    )

    self.model = model_4bit
    self.tokenizer = tokenizer

def load_model(self):
    return self.model

def generate(self, prompt: str, schema: BaseModel) -> BaseModel:
    # Same as the previous example above
    model = self.load_model()
    pipeline = transformers.pipeline(
        "text-generation",
        model=model,
        tokenizer=self.tokenizer,
        use_cache=True,
        device_map="auto",
        max_length=2500,
        do_sample=True,
        top_k=5,
        num_return_sequences=1,
        eos_token_id=self.tokenizer.eos_token_id,
        pad_token_id=self.tokenizer.eos_token_id,
    )

    # Create parser required for JSON confinement using lmformatenforcer
    parser = JsonSchemaParser(schema.schema())
    prefix_function = build_transformers_prefix_allowed_tokens_fn(
        pipeline.tokenizer, parser
    )

    # Output and load valid JSON
    output_dict = pipeline(prompt, prefix_allowed_tokens_fn=prefix_function)
    output = output_dict[0]["generated_text"][len(prompt) :]
    json_result = json.loads(output)

    # Return valid JSON object according to the schema DeepEval supplied
    return schema(**json_result)

async def a_generate(self, prompt: str, schema: BaseModel) -> BaseModel:
    return self.generate(prompt, schema)

def get_model_name(self):
    return "Llama-3 8B"
  1. Login into your hugging face using api_key.
    !huggingface-cli login

  2. Create an instance of your custom_llm.

custom_llm = CustomLlama3_8B()

  1. Evaluate the test case using the Role Adherence metric

from deepeval.metrics import RoleAdherenceMetric
convo_test_case = ConversationalTestCase(
chatbot_role="You are a human marketing virtual assistant for question-answering tasks.",
turns=[LLMTestCase(input="What services does BizTech Analytics provide?", actual_output="BizTech Analytics provides tailored AI and data services designed to fit an organization’s needs, driving efficiency, growth, and innovation through deep industry insights and advanced technologies.")]
)
metric = RoleAdherenceMetric(threshold=0.5,model=custom_llm)

metric.measure(convo_test_case)
print(metric.score)
print(metric.reason)

Expected behavior
The result should be the score and reason output from the metric.

Screenshots
image

Desktop (please complete the following information):

  • OS: Windows 11
  • Browser: Chrome
  • Version: 129.0.6668.101 (Official Build) (64-bit) (cohort: Stable)

Additional context
Ran all of this in Google colab. I have used all of this code from the deepeval documentation.
https://docs.confident-ai.com/docs/guides-using-custom-llms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant