Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CAI-185] Chatbot/docker compose with Redis and DynamoDB for local development #1193

Merged
merged 29 commits into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
ac9b39a
feat(chatbot): session GSI
batdevis Oct 8, 2024
36c89af
feat(chatbot): docker compose
batdevis Oct 9, 2024
a2c16c7
fix(chatbot): dynamodb and redis for local development with docker co…
batdevis Oct 9, 2024
c633df6
Merge branch 'chatbot/docker-compose-complete' into chatbot/sessions-…
batdevis Oct 9, 2024
08a7fec
chore(chatbot):remove duplicate imports
batdevis Oct 9, 2024
7fe9101
chore(chatbot): linting
batdevis Oct 9, 2024
9b2eb61
fix(chatbot):create index in docker
batdevis Oct 9, 2024
b251837
chore(chatbot): llamaindex index id
batdevis Oct 10, 2024
4ecd4b3
fix(chatbot): create vector index with all docs
batdevis Oct 10, 2024
a84581c
Merge branch 'main' into chatbot/docker-compose-complete
batdevis Oct 10, 2024
ea7d3db
chore(chatbot): terraform lint
batdevis Oct 10, 2024
28695e3
fix(chatbot): terraform syntax
batdevis Oct 10, 2024
238edfd
chore(chatbot): remove dynamodb options
batdevis Oct 10, 2024
5f63560
chore(chatbot): from global to local secondary index
batdevis Oct 10, 2024
859c298
Merge branch 'main' into chatbot/docker-compose-complete
batdevis Oct 11, 2024
f43a771
chore: merge main
batdevis Oct 11, 2024
d96a9f9
chore: remove old var
batdevis Oct 11, 2024
9526c17
chore: merge main
batdevis Oct 11, 2024
a5177df
Update apps/chatbot/docker/compose.yaml
batdevis Oct 11, 2024
4daccf8
chore: remove logs
batdevis Oct 11, 2024
f5bdfd3
Merge branch 'chatbot/docker-compose-complete' of github.com:pagopa/d…
batdevis Oct 11, 2024
c123b5c
fix(chatbot): compose vars
batdevis Oct 13, 2024
aa59ca5
Merge branch 'main' into chatbot/docker-compose-complete
batdevis Oct 13, 2024
5e07dbe
Update modules
mdciri Oct 16, 2024
da8a41c
Update config prompts
mdciri Oct 16, 2024
b57d55c
Update env example
mdciri Oct 16, 2024
62fffa1
Merge branch 'main' into chatbot/docker-compose-complete
batdevis Oct 16, 2024
f7b05e6
Merge branch 'languages/chatbot/cai-198' into chatbot/docker-compose-…
batdevis Oct 16, 2024
278d56d
redis admin port
batdevis Oct 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion apps/chatbot/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ CHB_WEBSITE_URL=...
CHB_REDIS_INDEX_NAME=...
CHB_LLAMAINDEX_INDEX_ID=...
CHB_DOCUMENTATION_DIR=...
CHB_USE_PRESIDIO=...
CHB_GOOGLE_API_KEY=...
CHB_PROVIDER=...
CHB_MODEL_ID=...
Expand All @@ -21,6 +22,7 @@ CHB_MODEL_MAXTOKENS=...
CHB_EMBED_MODEL_ID=...
CHB_ENGINE_SIMILARITY_TOPK=...
CHB_ENGINE_SIMILARITY_CUTOFF=...
CHB_ENGINE_USE_ASYNC=...
CHB_ENGINE_USE_ASYNC=True
CHB_ENGINE_USE_STREAMING=...
CHB_QUERY_TABLE_PREFIX=chatbot-local
CHB_DYNAMODB_URL=http://locahost:8080
10 changes: 8 additions & 2 deletions apps/chatbot/config/prompts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ qa_prompt_str: |
- the answer must be clear, non-redundant, and have not repeated sentences.
- the answer must not include the query.
- If your answer is based on this retrieved context, include a "Rif" section at the end of the response, listing the titles and filenames from the source nodes used. If no context is used, do not include a reference.
- the answer must be with the same language of the query.
--------------------
Output Examples:
Query: Cos'è il nodo dei pagamenti?
Expand All @@ -38,7 +37,14 @@ qa_prompt_str: |
--------------------
Task:
Given the query: {query_str}
Answer the query according to the `Chatbot Policy` listed above.
Reply to the user following these two steps:
Step 1:
Pay great attention in detail on the query's language and determine if it is formulated in Italian, English, Spanish, French, German, Greek, Croatian, or Slovenian ('yes' or 'no').
Step 2:
If Step 1 returns 'yes': reply always in Italian, regardless of the input language, according to the `Chatbot Policy` listed above.
Otherwise: reply you cannot speak that language and ask for a new query written in an accepted language.
Answer:
Expand Down
6 changes: 5 additions & 1 deletion apps/chatbot/docker/app.local.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
FROM python:3.12.4-slim-bullseye
ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
apt-get install -y \
curl

ENV PYTHONPATH=/app

RUN pip install --upgrade pip \
Expand All @@ -14,4 +18,4 @@ RUN poetry install

COPY . .

CMD ["fastapi", "dev", "src/app/main.py", "--port", "8080"]
CMD ["fastapi", "dev", "src/app/main.py", "--port", "8080", "--host", "0.0.0.0"]
57 changes: 57 additions & 0 deletions apps/chatbot/docker/compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
services:
api:
build:
context: ..
dockerfile: docker/app.local.Dockerfile
ports:
- "8080:8080"
volumes:
- ..:/app
- ./files/.aws:/root/.aws
- ../../nextjs-website/out:/app/build-devp/out
depends_on:
redis:
condition: service_started
dynamodb:
condition: service_started
networks:
- ntw

dynamodb:
image: amazon/dynamodb-local:2.5.2
environment:
- AWS_ACCESS_KEY_ID=dummy
- AWS_SECRET_ACCESS_KEY=dummy
- AWS_DEFAULT_REGION=local
ports:
- "8000:8000"
networks:
- ntw

redis:
image: redis/redis-stack:7.2.0-v13
ports:
- "6379:6379"
- "8001:8001"
networks:
- ntw

create_index:
build:
context: ..
dockerfile: docker/app.local.Dockerfile
ports:
- "8080:8080"
volumes:
- ..:/app
batdevis marked this conversation as resolved.
Show resolved Hide resolved
- ../../nextjs-website/out:/app/build-devp/out
command: "python src/modules/create_vector_index.py --params config/params.yaml"
tty: true
depends_on:
redis:
condition: service_started
networks:
- ntw

networks:
ntw:
2 changes: 2 additions & 0 deletions apps/chatbot/docker/docker-compose-up-api.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#!/bin/bash
docker compose -f docker/compose.yaml -p chatbot up api
2 changes: 2 additions & 0 deletions apps/chatbot/docker/docker-run-create-index.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#!/bin/bash
docker compose -f docker/compose.yaml -p chatbot up create_index
2 changes: 2 additions & 0 deletions apps/chatbot/docker/docker-run-local-bash.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#!/bin/bash
docker run -it --env-file ./.env fastapi-local bash
2 changes: 2 additions & 0 deletions apps/chatbot/docker/files/.aws/config
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[profile default]
region = eu-south-1
3 changes: 3 additions & 0 deletions apps/chatbot/docker/files/.aws/credentials
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[default]
aws_access_key_id = 123
aws_secret_access_key = xyz
christian-calabrese marked this conversation as resolved.
Show resolved Hide resolved
1,583 changes: 888 additions & 695 deletions apps/chatbot/poetry.lock

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion apps/chatbot/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,10 @@ llama-index-llms-gemini = "^0.3.4"
google-generativeai = "^0.5.2"
llama-index-embeddings-gemini = "^0.2.0"
llama-index-llms-bedrock-converse = "^0.3.0"
chromedriver-py = "^129.0.6668.91"
llama-index-postprocessor-presidio = "^0.2.0"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
build-backend = "poetry.core.masonry.api"
51 changes: 29 additions & 22 deletions apps/chatbot/src/app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,30 +18,30 @@

params = yaml.safe_load(open("config/params.yaml", "r"))
prompts = yaml.safe_load(open("config/prompts.yaml", "r"))
chatbot = Chatbot(params, prompts)

AWS_DEFAULT_REGION = os.getenv('CHB_AWS_DEFAULT_REGION', os.getenv('AWS_DEFAULT_REGION', None))

chatbot = Chatbot(params, prompts)


class Query(BaseModel):
question: str
queriedAt: str | None = None

if (os.getenv('environment', 'dev') == 'local'):
profile_name='dummy'
endpoint_url='http://localhost:8000'
region_name = AWS_DEFAULT_REGION

boto3_session = boto3.session.Session(
profile_name = locals().get('profile_name', None),
region_name=locals().get('region_name', None)
region_name=AWS_DEFAULT_REGION
)

dynamodb = boto3_session.resource(
'dynamodb',
endpoint_url=locals().get('endpoint_url', None),
region_name=locals().get('region_name', None),
)
if (os.getenv('environment', 'dev') == 'local'):
dynamodb = boto3_session.resource(
'dynamodb',
endpoint_url=os.getenv('CHB_DYNAMODB_URL', 'http://localhost:8000'),
region_name=AWS_DEFAULT_REGION
)
else:
dynamodb = boto3_session.resource(
'dynamodb',
region_name=AWS_DEFAULT_REGION
)

table_queries = dynamodb.Table(
f"{os.getenv('CHB_QUERY_TABLE_PREFIX', 'chatbot')}-queries"
Expand Down Expand Up @@ -160,12 +160,13 @@ async def sessions_fetching(
raise HTTPException(status_code=422, detail=f"[sessions_fetching] userId: {userId}, error: {e}")

# TODO: pagination
items = db_response.get('Items', [])
result = {
"items": db_response['Items'],
"items": items,
"page": 1,
"pages": 1,
"size": len(db_response['Items']),
"total": len(db_response['Items']),
"size": len(items),
"total": len(items),
}
return result

Expand Down Expand Up @@ -214,20 +215,26 @@ async def queries_fetching(
sessionId = last_session_id(userId)

try:
# TODO: add userId filter
db_response = table_queries.query(
KeyConditionExpression=Key("sessionId").eq(sessionId)
KeyConditionExpression=Key("sessionId").eq(sessionId) &
Key("id").eq(userId)
)
except (BotoCoreError, ClientError) as e:
raise HTTPException(status_code=422, detail=f"[queries_fetching] sessionId: {sessionId}, error: {e}")

result = db_response['Items']
result = db_response.get('Items', [])
return result


def last_session_id(userId: str):
# TODO: retrieve last user session
return '1'
db_response = table_sessions.query(
IndexName='SessionsByCreatedAtIndex',
KeyConditionExpression=Key('userId').eq(userId),
ScanIndexForward=False,
Limit=1
)
items = db_response.get('Items', [])
return items[0] if items else None

@app.patch("/queries/{id}")
async def query_feedback (badAnswer: bool):
Expand Down
17 changes: 11 additions & 6 deletions apps/chatbot/src/modules/chatbot.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,7 @@
from src.modules.presidio import PresidioPII


AWS_S3_BUCKET = os.getenv("CHB_AWS_S3_BUCKET")
ITALIAN_THRESHOLD = 0.85
NUM_MIN_WORDS_QUERY = 3
NUM_MIN_REFERENCES = 1
USE_PRESIDIO = True if os.getenv("CHB_USE_PRESIDIO", "True") == "True" else False
RESPONSE_TYPE = Union[
Response, StreamingResponse, AsyncStreamingResponse, PydanticResponse
]
Expand All @@ -36,7 +33,9 @@ def __init__(

self.params = params
self.prompts = prompts
self.pii = PresidioPII(config=params["config_presidio"])
if USE_PRESIDIO:
self.pii = PresidioPII(config=params["config_presidio"])

self.model = get_llm()
self.embed_model = get_embed_model()
self.index = load_automerging_index_redis(
Expand Down Expand Up @@ -111,6 +110,9 @@ def _get_response_str(self, engine_response: RESPONSE_TYPE) -> str:
"""
else:
response_str = self._unmask_reference(response_str, nodes)

if "Step 2:" in response_str:
response_str = response_str.split("Step 2:")[1].strip()

return response_str

Expand Down Expand Up @@ -142,7 +144,10 @@ def _unmask_reference(self, response_str: str, nodes) -> str:


def mask_pii(self, message: str) -> str:
return self.pii.mask_pii(message)
if USE_PRESIDIO:
return self.pii.mask_pii(message)
else:
return message


def generate(self, query_str: str) -> str:
Expand Down
28 changes: 12 additions & 16 deletions apps/chatbot/src/modules/presidio.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@


# see supported entities by Presidio with their description at: https://microsoft.github.io/presidio/supported_entities/
ENTITIES = [
GLOBAL_ENTITIES = [
"CREDIT_CARD",
"CRYPTO",
"DATE_TIME",
Expand All @@ -23,21 +23,16 @@
"LOCATION",
"PERSON",
"PHONE_NUMBER",
"MEDICAL_LICENSE",
"MEDICAL_LICENSE"
]

IT_ENTITIES = [
"IT_FISCAL_CODE",
"IT_DRIVER_LICENSE",
"IT_VAT_CODE",
"IT_PASSPORT",
"IT_IDENTITY_CARD",
"IT_PHYSICAL_ADDRESS", # this is a custom entity added to the analyzer registry
# "ES_NIF",
# "ES_NIE",
# "US_BANK_NUMBER",
# "US_DRIVER_LICENSE",
# "US_ITIN",
# "US_PASSPORT",
# "US_SSN",
# "UK_NHS"
"IT_PHYSICAL_ADDRESS"
]

ALLOW_LIST = [
Expand Down Expand Up @@ -102,9 +97,10 @@ def __init__(
analyzer_threshold: float = 0.4
):
self.config = config
self.languages = [item["lang_code"] for item in config["models"]]
self.entity_mapping = entity_mapping
self.mapping = mapping
self.entities = entities if entities else ENTITIES
self.entities = entities if entities else GLOBAL_ENTITIES
self.analyzer_threshold = analyzer_threshold

if isinstance(self.config, (Path, str)):
Expand All @@ -117,7 +113,7 @@ def __init__(
self.nlp_engine = nlp_engine
self.analyzer = AnalyzerEngine(
nlp_engine = self.nlp_engine,
supported_languages = ["it", "en"], # "es", "fr", "de"
supported_languages = self.languages,
default_score_threshold = analyzer_threshold
)
self._add_italian_physical_address_entity()
Expand All @@ -136,7 +132,7 @@ def detect_language(self, text: str) -> str:
detected_languages = detect_langs(text)
lang_list = []
for detected_lang in detected_languages:
if detected_lang.lang in ["it", "en", "es", "fr", "de"]:
if detected_lang.lang in self.languages:
lang_list.append(detected_lang.lang)

if not lang_list:
Expand All @@ -145,7 +141,7 @@ def detect_language(self, text: str) -> str:
elif "it" in lang_list:
lang = "it"
else:
lang = "en" # lang_list[0].lang
lang = lang_list[0]
except:
logging.warning("No detected language.")
lang = "it"
Expand All @@ -160,7 +156,7 @@ def detect_pii(self, text: str) -> List[RecognizerResult]:
results = self.analyzer.analyze(
text=text,
language=lang,
entities=self.entities,
entities=self.entities + IT_ENTITIES if lang == "it" else self.entities,
allow_list=ALLOW_LIST
)

Expand Down
3 changes: 2 additions & 1 deletion apps/chatbot/src/modules/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ def get_ssm_parameter(name: str, default: str | None = None) -> str | None:
:param default: The default value to return if the parameter is not found.
:return: The value of the requested parameter.
"""

ssm = boto3.client(
"ssm",
aws_access_key_id=AWS_ACCESS_KEY_ID,
Expand All @@ -37,4 +38,4 @@ def get_ssm_parameter(name: str, default: str | None = None) -> str | None:
return default

logging.debug(f"Parameter {name} retrieved from SSM")
return response["Parameter"]["Value"]
return response["Parameter"]["Value"]
Loading
Loading