Skip to content

Commit

Permalink
[pre-commit.ci] auto fixes from pre-commit.com hooks
Browse files Browse the repository at this point in the history
for more information, see https://pre-commit.ci
  • Loading branch information
pre-commit-ci[bot] committed Jan 3, 2025
1 parent d833b96 commit 8e15109
Show file tree
Hide file tree
Showing 4 changed files with 41 additions and 40 deletions.
60 changes: 31 additions & 29 deletions comps/guardrails/hallucination_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,47 +6,47 @@ Hallucination in AI, particularyly in large language models (LLMs), spans a wide

### Forms of Hallucination

* **Factual Errors**: The AI generates responses containing incorrect or fabricated facts. _Example_: Claiming a historical event occured when it did not.
- **Factual Errors**: The AI generates responses containing incorrect or fabricated facts. _Example_: Claiming a historical event occurred when it did not.

* **Logical Inconsistencies**: Ouputs that fail to follow logical reasoning or contradict themselves. _Example_: Stating that a person is alive in one sentence and deceased in another.
- **Logical Inconsistencies**: Outputs that fail to follow logical reasoning or contradict themselves. _Example_: Stating that a person is alive in one sentence and deceased in another.

* **Context Misalignment**: Responses that diverge from the input prompt or fail to address the intended context. _Example_: Providing irrelevant information or deviating from topic.
- **Context Misalignment**: Responses that diverge from the input prompt or fail to address the intended context. _Example_: Providing irrelevant information or deviating from topic.

* **Fabricated References**: Creating citations, statistics, or other details that appear authentic but lack real-world grounding. _Example_: Inventing a study or paper that doesn't exist.
- **Fabricated References**: Creating citations, statistics, or other details that appear authentic but lack real-world grounding. _Example_: Inventing a study or paper that doesn't exist.

### Importance of Hallucination Detection
The Importance of hallucination detection cannot be overstated. Ensuring the factual correctness and contextual fidelity of AI-generated content is essential for:

* __Building Trust__: Reducing hallucinations foster user confidence in AI system.
* __Ensuring Compliance__: Meeting legal and ethical standards in regulated industries.
* __Enhancing Reliability__: Improving the overall robustness and performance of AI applications.
The Importance of hallucination detection cannot be overstated. Ensuring the factual correctness and contextual fidelity of AI-generated content is essential for:

- **Building Trust**: Reducing hallucinations foster user confidence in AI system.
- **Ensuring Compliance**: Meeting legal and ethical standards in regulated industries.
- **Enhancing Reliability**: Improving the overall robustness and performance of AI applications.

### Define the Scope of Our Hallucination Detection
Tackling the entire scope of hallucination is beyond our immediate scope. Training datasets inherently lag behind the question-and-answer needs due to their static nature. Also, Retrieval-Augmented Generation (RAG) is emerging as a preferred approach for LLMs, where model ouputs are grounded in retrieved context to enhance accuracy and relevance and rely on integration of Document-Question-Answer triplets.

Therefore, we focus on detecting contextualized hallucinations with the following strategies:
* Using LLM-as-a-judge to evaluate hallucinations.
* Detect whether Context-Question-Answer triplet contains hallucinations.


Tackling the entire scope of hallucination is beyond our immediate scope. Training datasets inherently lag behind the question-and-answer needs due to their static nature. Also, Retrieval-Augmented Generation (RAG) is emerging as a preferred approach for LLMs, where model outputs are grounded in retrieved context to enhance accuracy and relevance and rely on integration of Document-Question-Answer triplets.

Therefore, we focus on detecting contextualized hallucinations with the following strategies:

- Using LLM-as-a-judge to evaluate hallucinations.
- Detect whether Context-Question-Answer triplet contains hallucinations.

## 🚀1. Start Microservice based on vLLM endpoint on Intel Gaudi Accelerator

### 1.1 Define Environment Variables

```bash
export your_ip=<your ip>
export port_number=9008
export HUGGINGFACEHUB_API_TOKEN=<token>
export vLLM_ENDPOINT="http://${your_ip}:${port_number}"
export LLM_MODEL="PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct"
```
For gated models such as `LLAMA-2`, you will have to pass the environment HUGGINGFACEHUB_API_TOKEN. Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token and export `HUGGINGFACEHUB_API_TOKEN` environment with the token.


For gated models such as `LLAMA-2`, you will have to pass the environment HUGGINGFACEHUB_API_TOKEN. Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token and export `HUGGINGFACEHUB_API_TOKEN` environment with the token.

### 1.2 Launch vLLM Service on Gaudi Accelerator

#### Launch vLLM service on a single node

```bash
Expand All @@ -55,31 +55,33 @@ bash ./launch_vllm_service.sh ${port_number} ${LLM_MODEL} hpu 1

## 2. Set up Hallucination Microservice

Then we wrap the vLLM Service into Hallucination Microservice.
Then we wrap the vLLM Service into Hallucination Microservice.

### 2.1 Build Docker

```bash
bash build_docker_hallucination_microservice.sh
```

### 2.2 Launch Hallucination Microservice

```bash
bash launch_hallucination_microservice.sh
```


## 🚀3. Get Status of Hallucination Microservice

```bash
docker container logs -f halluc-detection
```

## 🚀4. Consume Guardrail Micorservice Post-LLM

Once microservice starts, users can use examples (bash or python) below to apply hallucination detection for LLM's response (Post-LLM)

**Bash:**

<span style="font-size:20px">*Case without Hallucination (Valid Output)*</span>
<span style="font-size:20px">_Case without Hallucination (Valid Output)_</span>

```bash
DOCUMENT=".......An important part of CDC’s role during a public health emergency is to develop a test for the pathogen and equip state and local public health labs with testing capacity. CDC developed an rRT-PCR test to diagnose COVID-19. As of the evening of March 17, 89 state and local public health labs in 50 states......"
Expand All @@ -88,7 +90,7 @@ QUESTION="What kind of test can diagnose COVID-19?"

ANSWER=" rRT-PCR test"

DATA='{"messages":[{"role": "user", "content": "Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided answer and determine whehter it is faithful to the contents of the DOCUMENT. The ANSWER must not offer new information beyond the context provided in the DOCUMENT. The ANSWER also must not contradict information provided in the DOCUMENT. Output your final verdict by strictly following this format: \"PASS\" is the answer is faithful to the DOCUMENT and \"FAIL\" if the answer is not faithful to the DOCUMENT. Show your reasoning.\n\n--\nQUESTION (THIS DOES NOT COUNT AS BACKGROUND INFORMATION):\n{question}\n\n--\nDOCUMENT:\n{document}\n\n--\nANSWER:\n{answer}\n\n--\n\n Your output should be in JSON FORMAT with the keys \"REASONING\" and \"SCORE\":\n{{\"REASONING\": <your reasoning as bullet points>, \"SCORE\": <your final score>}}"}], "max_tokens":600,"model": "PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct" }'
DATA='{"messages":[{"role": "user", "content": "Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided answer and determine whether it is faithful to the contents of the DOCUMENT. The ANSWER must not offer new information beyond the context provided in the DOCUMENT. The ANSWER also must not contradict information provided in the DOCUMENT. Output your final verdict by strictly following this format: \"PASS\" is the answer is faithful to the DOCUMENT and \"FAIL\" if the answer is not faithful to the DOCUMENT. Show your reasoning.\n\n--\nQUESTION (THIS DOES NOT COUNT AS BACKGROUND INFORMATION):\n{question}\n\n--\nDOCUMENT:\n{document}\n\n--\nANSWER:\n{answer}\n\n--\n\n Your output should be in JSON FORMAT with the keys \"REASONING\" and \"SCORE\":\n{{\"REASONING\": <your reasoning as bullet points>, \"SCORE\": <your final score>}}"}], "max_tokens":600,"model": "PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct" }'

DATA=$(echo $DATA | sed "s/{question}/$QUESTION/g; s/{document}/$DOCUMENT/g; s/{answer}/$ANSWER/g")

Expand All @@ -105,7 +107,7 @@ Example Output:
{"REASONING": ['The CONTEXT mentions that the CDC developed an rRT-PCR test to diagnose COVID-19.', 'The CONTEXT does not describe what rRT-PCR stands for or how the test works.', 'The ANSWER simply states that the test is an rRT-PCR test.', 'The ANSWER does not provide additional information about the test, such as its full form or methodology.', 'Given the QUESTION about what kind of test can diagnose COVID-19, the ANSWER is faithful to the CONTEXT because it correctly identifies the type of test developed by the CDC, even though it lacks detailed explanation.'], "SCORE": PASS}
```

<span style="font-size:20px">*Case with Hallucination (Invalid or Inconsistent Output)*</span>
<span style="font-size:20px">_Case with Hallucination (Invalid or Inconsistent Output)_</span>

```bash
DOCUMENT="750 Seventh Avenue is a 615 ft (187m) tall Class-A office skyscraper in New York City. 101 Park Avenue is a 629 ft tall skyscraper in New York City, New York."
Expand All @@ -114,7 +116,7 @@ QUESTION=" 750 7th Avenue and 101 Park Avenue, are located in which city?"

ANSWER="750 7th Avenue and 101 Park Avenue are located in Albany, New York"

DATA='{"messages":[{"role": "user", "content": "Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided answer and determine whehter it is faithful to the contents of the DOCUMENT. The ANSWER must not offer new information beyond the context provided in the DOCUMENT. The ANSWER also must not contradict information provided in the DOCUMENT. Output your final verdict by strictly following this format: \"PASS\" is the answer is faithful to the DOCUMENT and \"FAIL\" if the answer is not faithful to the DOCUMENT. Show your reasoning.\n\n--\nQUESTION (THIS DOES NOT COUNT AS BACKGROUND INFORMATION):\n{question}\n\n--\nDOCUMENT:\n{document}\n\n--\nANSWER:\n{answer}\n\n--\n\n Your output should be in JSON FORMAT with the keys \"REASONING\" and \"SCORE\":\n{{\"REASONING\": <your reasoning as bullet points>, \"SCORE\": <your final score>}}"}], "max_tokens":600,"model": "PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct" }'
DATA='{"messages":[{"role": "user", "content": "Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided answer and determine whether it is faithful to the contents of the DOCUMENT. The ANSWER must not offer new information beyond the context provided in the DOCUMENT. The ANSWER also must not contradict information provided in the DOCUMENT. Output your final verdict by strictly following this format: \"PASS\" is the answer is faithful to the DOCUMENT and \"FAIL\" if the answer is not faithful to the DOCUMENT. Show your reasoning.\n\n--\nQUESTION (THIS DOES NOT COUNT AS BACKGROUND INFORMATION):\n{question}\n\n--\nDOCUMENT:\n{document}\n\n--\nANSWER:\n{answer}\n\n--\n\n Your output should be in JSON FORMAT with the keys \"REASONING\" and \"SCORE\":\n{{\"REASONING\": <your reasoning as bullet points>, \"SCORE\": <your final score>}}"}], "max_tokens":600,"model": "PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct" }'

DATA=$(echo $DATA | sed "s/{question}/$QUESTION/g; s/{document}/$DOCUMENT/g; s/{answer}/$ANSWER/g")

Expand All @@ -127,9 +129,11 @@ curl http://localhost:9080/v1/halluc_detection \
```

Example Output:

```bash
{"REASONING": ['The CONTEXT specifies that 750 Seventh Avenue and 101 Park Avenue are located in New York City.', 'The ANSWER incorrectly states that these locations are in Albany, New York.', 'The QUESTION asks for the city where these addresses are located.', 'The correct answer should be New York City, not Albany.'], "SCORE": FAIL}
```

**Python Script:**

```python
Expand All @@ -140,10 +144,13 @@ proxies = {"http": ""}
url = "http://localhost:9080/v1/halluc_detection"
data = {
"messages": [
{"role": "user", "content": 'Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided answer and determine whehter it is faithful to the contents of the DOCUMENT. The ANSWER must not offer new information beyond the context provided in the DOCUMENT. The ANSWER also must not contradict information provided in the DOCUMENT. Output your final verdict by strictly following this format: "PASS" is the answer is faithful to the DOCUMENT and "FAIL" if the answer is not faithful to the DOCUMENT. Show your reasoning.\n\n--\nQUESTION (THIS DOES NOT COUNT AS BACKGROUND INFORMATION):\n 750 7th Avenue and 101 Park Avenue, are located in which city?\n\n--\nDOCUMENT:\n750 Seventh Avenue is a 615 ft (187m) tall Class-A office skyscraper in New York City. 101 Park Avenue is a 629 ft tall skyscraper in New York City, New York.\n\n--\nANSWER:\n750 7th Avenue and 101 Park Avenue are located in Albany, New York\n\n--\n\n Your output should be in JSON FORMAT with the keys "REASONING" and "SCORE":\n{{"REASONING": <your reasoning as bullet points>, "SCORE": <your final score>}}'}
{
"role": "user",
"content": 'Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided answer and determine whether it is faithful to the contents of the DOCUMENT. The ANSWER must not offer new information beyond the context provided in the DOCUMENT. The ANSWER also must not contradict information provided in the DOCUMENT. Output your final verdict by strictly following this format: "PASS" is the answer is faithful to the DOCUMENT and "FAIL" if the answer is not faithful to the DOCUMENT. Show your reasoning.\n\n--\nQUESTION (THIS DOES NOT COUNT AS BACKGROUND INFORMATION):\n 750 7th Avenue and 101 Park Avenue, are located in which city?\n\n--\nDOCUMENT:\n750 Seventh Avenue is a 615 ft (187m) tall Class-A office skyscraper in New York City. 101 Park Avenue is a 629 ft tall skyscraper in New York City, New York.\n\n--\nANSWER:\n750 7th Avenue and 101 Park Avenue are located in Albany, New York\n\n--\n\n Your output should be in JSON FORMAT with the keys "REASONING" and "SCORE":\n{{"REASONING": <your reasoning as bullet points>, "SCORE": <your final score>}}',
}
],
"max_tokens": 600,
"model": "PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct"
"model": "PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct",
}

try:
Expand All @@ -153,8 +160,3 @@ try:
except requests.exceptions.RequestException as e:
print("An error occurred:", e)
```





1 change: 0 additions & 1 deletion comps/guardrails/hallucination_detection/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,3 @@
pip --no-cache-dir install -r requirements-runtime.txt

python hallucination_detection.py

Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@
import os
from typing import Union

import requests
from fastapi.responses import StreamingResponse
from langchain_community.llms import VLLMOpenAI
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
from langchain.schema import HumanMessage, SystemMessage
from langchain_community.llms import VLLMOpenAI
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from template import ChatTemplate
import requests

from comps import (
CustomLogger,
Expand Down Expand Up @@ -170,14 +170,14 @@ async def stream_generator():
payload["messages"] = input.messages
payload["max_tokens"] = input.max_tokens
payload["model"] = input.model
response = requests.post(llm_endpoint+"/v1/chat/completions", json=payload, headers=headers)
response = requests.post(llm_endpoint + "/v1/chat/completions", json=payload, headers=headers)

if logflag:
logger.info(response.text)

return GeneratedDoc(text=response.json()['choices'][0]['message']['content'], prompt='')
else:
return GeneratedDoc(text=response.json()["choices"][0]["message"]["content"], prompt="")

else:
if logflag:
logger.info("[ UNKNOWN ] input from user")

Expand Down
6 changes: 3 additions & 3 deletions tests/guardrails/test_guardrails_halluc_detection.sh
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ function start_service() {
--trust-remote-code
until curl -s http://localhost:$port_number/health > /dev/null; do
echo "Waiting for vllm serving to start..."
sleep 5m
sleep 5m
done
echo "vllm serving started"

Expand All @@ -64,13 +64,13 @@ function start_service() {
-e LLM_MODEL=$LLM_MODEL \
-e LOGFLAG=$LOGFLAG \
opea/guardrails-halluc-detection:comps
sleep 10
sleep 10
echo "Microservice started"
}

function validate_microservice() {
echo "Validate microservice started"
DATA='{"messages":[{"role": "user", "content": "Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided answer and determine whehter it is faithful to the contents of the DOCUMENT. The ANSWER must not offer new information beyond the context provided in the DOCUMENT. The ANSWER also must not contradict information provided in the DOCUMENT. Output your final verdict by strictly following this format: \"PASS\" is the answer is faithful to the DOCUMENT and \"FAIL\" if the answer is not faithful to the DOCUMENT. Show your reasoning.\n\n--\nQUESTION (THIS DOES NOT COUNT AS BACKGROUND INFORMATION):\n{question}\n\n--\nDOCUMENT:\n{document}\n\n--\nANSWER:\n{answer}\n\n--\n\n Your output should be in JSON FORMAT with the keys \"REASONING\" and \"SCORE\":\n{{\"REASONING\": <your reasoning as bullet points>, \"SCORE\": <your final score>}}"}], "max_tokens":600,"model": "PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct" }'
DATA='{"messages":[{"role": "user", "content": "Given the following QUESTION, DOCUMENT and ANSWER you must analyze the provided answer and determine whether it is faithful to the contents of the DOCUMENT. The ANSWER must not offer new information beyond the context provided in the DOCUMENT. The ANSWER also must not contradict information provided in the DOCUMENT. Output your final verdict by strictly following this format: \"PASS\" is the answer is faithful to the DOCUMENT and \"FAIL\" if the answer is not faithful to the DOCUMENT. Show your reasoning.\n\n--\nQUESTION (THIS DOES NOT COUNT AS BACKGROUND INFORMATION):\n{question}\n\n--\nDOCUMENT:\n{document}\n\n--\nANSWER:\n{answer}\n\n--\n\n Your output should be in JSON FORMAT with the keys \"REASONING\" and \"SCORE\":\n{{\"REASONING\": <your reasoning as bullet points>, \"SCORE\": <your final score>}}"}], "max_tokens":600,"model": "PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct" }'

echo "test 1 - Case with Hallucination (Invalid or Inconsistent Output)"
DOCUMENT="750 Seventh Avenue is a 615 ft (187m) tall Class-A office skyscraper in New York City. 101 Park Avenue is a 629 ft tall skyscraper in New York City, New York."
Expand Down

0 comments on commit 8e15109

Please sign in to comment.