Skip to content

Commit

Permalink
chore: merge main
Browse files Browse the repository at this point in the history
  • Loading branch information
batdevis committed Oct 11, 2024
2 parents 859c298 + cee4135 commit f43a771
Show file tree
Hide file tree
Showing 19 changed files with 2,028 additions and 954 deletions.
5 changes: 5 additions & 0 deletions .changeset/long-camels-sell.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"chatbot": minor
---

"Add Presidio to detect and mask PII entities"
6 changes: 5 additions & 1 deletion apps/chatbot/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,16 @@ PYTHONPATH=app-path
LOG_LEVEL=DEBUG
CHB_AWS_ACCESS_KEY_ID=...
CHB_AWS_SECRET_ACCESS_KEY=...
CHB_AWS_DEFAULT_REGION=eu-west-3
CHB_AWS_DEFAULT_REGION=eu-south-1
CHB_AWS_BEDROCK_REGION=eu-west-3
CHB_AWS_S3_BUCKET=...
CHB_AWS_GUARDRAIL_ID=...
CHB_AWS_GUARDRAIL_VERSION=...
CHB_REDIS_URL=...
CHB_WEBSITE_URL=...
CHB_REDIS_INDEX_NAME=...
CHB_LLAMAINDEX_INDEX_ID=...
CHB_DOCUMENTATION_DIR=...
CHB_GOOGLE_API_KEY=...
CHB_PROVIDER=...
CHB_MODEL_ID=...
Expand Down
40 changes: 11 additions & 29 deletions apps/chatbot/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
# PagoPA Chatbot

This folder contains all the details to build a RAG using the documentation provided in [`PagoPA Developer Portal`](https://developer.pagopa.it/). The retriver chosen is the `Auto Merging Retriver` one and it was implemented using [`llama-index`](https://docs.llamaindex.ai/en/stable/). Check out `src/modules/retriever.py`.
This folder contains all the details to build a RAG using the documentation provided in [`PagoPA Developer Portal`](https://developer.pagopa.it/).

This chatbot uses [`AWS Bedrock`](https://aws.amazon.com/bedrock/) as provider, so be sure to have installed [`aws-cli`](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and stored your credential in `~/.aws/credentials`.
This chatbot uses [Google](https://ai.google.dev/) or [AWS Bedrock](https://aws.amazon.com/bedrock/) as provider.
Even though the provider is the Google one, we stored its API key in AWS. So, be sure to have installed [aws-cli](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and stored your credential in `~/.aws/credentials`.

All the parameters and prompts used to build the Retrieval-Augmented Generation (RAG) are available in `config`.
The Retrieval-Augmented Generation (RAG) was implemented using [llama-index](https://docs.llamaindex.ai/en/stable/). All the parameters and prompts used are stored in `config`.

## Environment Variables

Create a `.env` file inside this folder and store the environment variables listed in `.env.example`.

## Virtual environment

Expand All @@ -27,40 +32,17 @@ The working directory is `/developer-portal/apps/chatbot`. So, to set the `PYTHO

In this way, `PYTHONPATH` points to where the Python packages and modules are, not where your checkouts are.

## File for Environment Variables

Create a `.env` file inside the folder and write to the file the following environment variables:

CHB_AWS_ACCESS_KEY_ID=...
CHB_AWS_SECRET_ACCESS_KEY=...
CHB_AWS_DEFAULT_REGION=...
CHB_AWS_S3_BUCKET=...
CHB_AWS_GUARDRAIL_ID=...
CHB_AWS_GUARDRAIL_VERSION=...
CHB_REDIS_URL=...
CHB_REDIS_INDEX_NAME=...
CHB_WEBSITE_URL=...
CHB_GOOGLE_API_KEY=...
CHB_PROVIDER=...
CHB_MODEL_ID=...
CHB_MODEL_TEMPERATURE=...
CHB_MODEL_MAXTOKENS=...
CHB_EMBED_MODEL_ID=...
CHB_ENGINE_SIMILARITY_TOPK=...
CHB_ENGINE_SIMILARITY_CUTOFF=...
CHB_ENGINE_USE_ASYNC=...
CHB_ENGINE_USE_STREAMING=...

## Knowledge vector database
## Knowledge index vector database

To reach the remote redis instance, it is necessary to open a tunnel:

```
./scripts/redis-tunnel.sh
```

Verify that the HTML files that compose the Developer Portal documentation exist in a directory. Otherwise create the documentation. Once you have the documentation directory ready, put its path in `params` and, in the end, create the vector index doing:

```
```
python src/modules/create_vector_index.py --params config/params.yaml
```

Expand Down
52 changes: 50 additions & 2 deletions apps/chatbot/config/params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,57 @@ vector_index:
path: index
chunk_sizes: [2816, 704, 176]
chunk_overlap: 20
use_redis: True
use_s3: False

engine:
response_mode: compact
verbose: False

config_presidio:
nlp_engine_name: spacy
models:
-
lang_code: en
model_name: en_core_web_md
-
lang_code: it
model_name: it_core_news_md
# -
# lang_code: de
# model_name: de_core_news_md
# -
# lang_code: es
# model_name: es_core_news_md
# -
# lang_code: fr
# model_name: fr_core_news_md
ner_model_configuration:
labels_to_ignore:
- ORDINAL
- QUANTITY
- ORGANIZATION
- ORG
- LANGUAGE
- PRODUCT
- MONEY
- PERCENT
- O
- CARDINAL
- EVENT
- WORK_OF_ART
- LAW
- MISC
model_to_presidio_entity_mapping:
PER: PERSON
PERSON: PERSON
LOC: LOCATION
LOCATION: LOCATION
GPE: LOCATION
ORG: ORGANIZATION
DATE: DATE_TIME
TIME: DATE_TIME
NORP: NRP
low_confidence_score_multiplier: 0.4
low_score_entity_names:
- ORGANIZATION
- ORG
default_score: 0.8
2 changes: 1 addition & 1 deletion apps/chatbot/config/prompts.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
qa_prompt_str: |
You are an customer services chatbot.
Your name is Discovery and your duty is to assist the user with the PagoPA DevPortal documentation!
Your name is Discovery and your duty is to assist the user with the PagoPA DevPortal documentation, homepage: https://dev.developer.pagopa.it!
--------------------
Context information:
{context_str}
Expand Down
2 changes: 2 additions & 0 deletions apps/chatbot/docker/app.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,6 @@ RUN poetry install

COPY . ${LAMBDA_TASK_ROOT}
RUN python ./scripts/nltk_download.py
RUN python ./scripts/spacy_download.py

CMD ["src.app.main.handler"]
Loading

0 comments on commit f43a771

Please sign in to comment.