This is a reimplementation of langgraph's customer support example in Rasa's CALM paradigm There's a YouTube video that provides a walkthrough of langgraph's implementation.
The bot has the following skills:
- Showing the user's booked flights
- Changing flight bookings
- Booking a rental car
- Booking a hotel
- Booking an excursion (e.g. museum visit) on the trip
Follow these steps to set up and run the Rasa assistant in a GitHub Codespace.
- You'll need a Rasa Pro license and an OpenAI API key.
- You should be familiar with Python and Bash.
To run the CALM assistant, you can watch the video below and/or follow the instructions:
CALM-assistant-setup-codespace.mp4
-
Create a Codespace:
- Navigate to the repository on GitHub.
- Click on the green "Code" button, then scroll down to "Codespaces".
- Click on "Create codespace on main branch".
- This should take under two minutes to load.
-
Set Up Environment:
- Once the Codespace loads, it will look like VSCode but in your browser!
- First, open two terminal windows.
- In both terminals, run
source .venv/bin/activate
- Open
/calm_llm/.env
file and add the required keys to that file.export RASA_PRO_LICENSE='your_rasa_pro_license_key_here' export OPENAI_API_KEY='your_openai_api_key_here'
- In both terminals, set your environment variables by running:
source calm_llm/.env
-
Create the Database:
- In one of the terminals, run the command to create the database:
python scripts/create_db.py
.
- In one of the terminals, run the command to create the database:
-
Train the Model:
- In the first terminal,
cd calm_llm
and then run:rasa train
- In the first terminal,
-
Start the Action Server:
- In the second terminal,
cd calm_llm
and then run:rasa run actions
- In the second terminal,
-
Launch the Rasa Inspector:
- In the first terminal, run:
rasa inspect --debug
- In the first terminal, run:
-
Access the Inspector:
- When prompted to open in browser, click the link.
-
Chat with your customer support assistant about flights, hotels, cars, and/or excursions!
- Keyboard bindings may not map correctly in the Codespace, so you may not be able to copy and paste as you normally would!
- The database creation is done separately to manage memory usage.
- The repository is compatible with Rasa Pro versions
>=3.10.0
. - You'll also notice that there are several subdirectories:
calm_llm
is the CALM implementation,calm_nlu
combines CALM with intent based NLU,langgraph_implementation
is the implementation inspired from langgraph's tutorial,calm_self_hosted
is the CALM implementation but a fine-tuned model such as Llama 3.1 8B working as the command generator, andcalm_nlu_self_hosted
is CALM working with intent based NLU and a fine-tuned model as the command generator.
We provide scripts to evaluate the assistant on 3 measures:
- number of tokens used per user turn (proxy for measuring LLM cost per user turn)
- latency (time to get a response)
- accuracy
To do so, we construct a test set to evaluate the following capabilities:
- Happy paths - Conversations with minimal complexity sticking to one skill.
- Slot corrections - Conversations where a user changes their mind in between and corrects themselves.
- Context switches - Conversations with a switch from one skill to another and coming back to the former skill.
- Cancellations - Conversations where the user decides to not proceed with the skill and stops midway.
- Multi Skill - Conversations where the user tries to accomplish multiple skills one after the other.
Ensure you have set up the environment in two active terminals by following the instructions in this section
Next, execute the following in the calm_llm
directory:
- Run action server - In one of the terminals, execute the command -
rasa run actions
- While the action server is running, in the second terminal execute:
MAX_NUMBER_OF_PREDICTIONS=50 python run_eval.py
This will print the results to your terminal. You can also pipe the results to a text file MAX_NUMBER_OF_PREDICTIONS=50 python run_eval.py > results.txt
.
Once the script finishes you will see runtime stats on input and output tokens consumed and latency incurred. These stats are grouped by the folder which contained the tests -
Running tests from ./e2e_tests/happy_paths
=============================
COST PER USER MESSAGE (USD)
---------------------------------
Mean: 0.031122631578947374
Min: 0.026789999999999998
Max: 0.038040000000000004
Median: 0.03162
---------------------------------
COMPLETION TOKENS PER USER MESSAGE
---------------------------------
Mean: 10.368421052631579
Min: 6
Max: 26
Median: 9.0
---------------------------------
PROMPT TOKENS PER USER MESSAGE
---------------------------------
Mean: 1016.6842105263158
Min: 881
Max: 1248
Median: 1021.0
---------------------------------
LATENCY PER USER MESSAGE (sec)
---------------------------------
Mean: 2.567301022379022
Min: 1.5348889827728271
Max: 4.782747983932495
Median: 2.067293882369995
---------------------------------
============================================================== short test summary info ===============================================================
================================================================= 0 failed, 5 passed
==================================================================
Navigate to langgraph_implementation
folder and then set up the environment with -
# Step 1: Create a new virtual environment
python -m venv new_env
# Step 2: Activate the virtual environment
source new_env/bin/activate
# Step 3: Install the packages from requirements.txt
pip install -r requirements.txt
Next, set up the necessary keys by opening the .env
file in that folder and filling the values for requested variables
TAVILY_API_KEY - Access key for Tavily, used for making search queries
LANGCHAIN_API_KEY - Langsmith access key, for monitoring and tracing LLM calls.
OPENAI_API_KEY - API key for OpenAI platform, for invoking the LLM.
Load the keys by running source .env
in the terminal window.
Then execute -
python run_eval.py
This will print the results to your terminal. You can also pipe the results to a text file python run_eval.py > results.txt
.
To create the figures in our blog post
- Generate data for CALM assistant
- Follow steps 2-5 from Steps to run CALM assistant section.
- On a separate terminal, navigate to
calm_llm
directory, runpython run_tests_for_plots.py
to generate data for figures. - Restructure the data for plotting with
cd results
and thenpython combine_data.py
- Generate data for CALM + NLU assistant
- Follow steps 2-5 from Steps to run CALM assistant section but in Steps 4 and 5,
cd calm_nlu
instead ofcd calm_llm
. - On a separate terminal, navigate to
calm_nlu
directory, runpython run_tests_for_plots.py
to generate data for figures - Restructure the data for plotting with
cd results
and thenpython combine_data.py
- Generate data for LangGraph assistant
- Run steps 1-5 from LangGraph assistant above
- In
langgraph_implementation
folder, runpython run_tests_for_plots.py
to generate data for figures - Restructure the data for plotting with
cd results
and thenpython combine_data.py
- Open
metrics.ipynb
(in root directory) - In the top-right of your screen, you should see 'Select Kernel', click on it
- Once prompted, install necessary extensions
- Once the extensions are installed, click "select Kernel' again and select 'Python Environments...'
- select the
.venv
environment for running the kernel: - execute all cells!