Hi! Thanks for your interest in contributing to Data Recipes AI, we're really excited to see you! In this document we'll try to summarize everything that you need to know to do a good job.
To get an overview of the project, please read the README and our Code of Conduct to keep our community approachable and respectable.
If you spot a problem, search if an issue already exists. If a related issue doesn't exist, you can open one there, selecting an appropriate issue type.
As a general rule, we don’t assign issues to anyone. If you find an issue to work on, you are welcome to open a PR with a fix.
To set up your local development environment for contributing follow the steps in the paragraphs below.
The easiest way to develop is to run in the Docker environment, see README for more details.
If running locally, you can reset your environment - removing any data for your databases, which means re-registration - by running ./cleanup.sh
.
The repo has been set up with black and flake8 pre-commit hooks. These are configured in the ``.pre-commit-config.yamlfile and initialized with
pre-commit autoupdate`.
On a new repo, you must run pre-commit install
to add pre-commit hooks.
To run code quality tests, you can run pre-commit run --all-files
GitHub has an action to run the pre-commit tests to ensure code adheres to standards. See folder 'github/workflows
for more details.
You should write tests for every feature you add or bug you solve in the code. Having automated tests for every line of our code lets us make big changes without worries: there will always be tests to verify if the changes introduced bugs or lack of features. If we don't have tests we will be blind and every change will come with some fear of possibly breaking something.
For a better design of your code, we recommend using a technique called test-driven development, where you write your tests before writing the actual code that implements the desired feature.
You can use pytest
to run your tests, no matter which type of test it is.
End-to-end tests have been configured in GitHub actions which use promptflow to call a wrapper around the chainlit UI, or order to test when memories/recipes are used as well as when the assistant does some on-the-fly analysis. To do this, the chainlit class is patched heavily, and there are limitations in how cleanly this could be done, so it isn't an exact replica of the true application, but does capture changes with the flow as well as test the assistant directly. The main body of integration tests will test recipes server and the assistant independently.
Additionally, there were some limitation when implementing in GitHub actions where workarounds were implemented until a lter data, namely: promptflow is run on the GitHub actions host rather than in docker, and the promptflow wrapper to call chainlit has to run as a script and kill the script based on a STDOUT string. These should be fixed in future.
Code for e2e tests can be found in flows/chainlit-ui-evaluation
as run by .github/workflows/e2e_tests.yml
The tests work using promptflow evaluation and a call to an LLM to guage groundedness, due to the fact LLM assistants can produce slightly different results if not providing answers from memory/recipes. The promptflow evaluation test data can be found in flows/chainlit-ui-evaluation/data.jsonl
.
See "Evaluating with Promptflow" below to see how to run e2e tests locally.
First, you will need to build the environment to include Prompt Flow ...
docker compose -f docker-compose.yml -f docker-compose-dev.yml up -d --build
Then ...
-
Install the DevContainers VSCode extension
-
Build data recipes using the
docker compose
command mentioned above -
Open the command palette in VSCode (CMD + Shift + P on Mac; CTRL + Shift + P on Windows) and select
Dev Containers: Attach to remote container
.Select the promptflow container. This opens a new VSCode window - use it for the next steps.
-
It should happen automatically, but if not, install Promptflow add-in
-
Open folder
/app
-
Click on
flow.dag.yaml
-
Top left of main pane, click on 'Visual editor'
- If you are taken to the promptflow 'Install dependencies'' screen, change the Python runtime to be
/azureml-envs/prompt-flow/runtime/bin/python
'runtime', then close and re-openflow.dag.yaml
- If you are taken to the promptflow 'Install dependencies'' screen, change the Python runtime to be
-
On the Groundedness node, select your new connection
-
You can no run by clicking the play icon. See Promptflow documentation for more details
As noted in the README, the repo supports assistants on OpenAI or Azure OpenAI. The README has instructions on how to change in the .env
file, remeber to change ASSISTANT_ID as well as the API settings, but you will also have to change the connection in the promptflow groundedness node accordingly.
As many other open source projects, we use the famous gitflow to manage our branches.
Summary of our git branching model:
- Get all the latest work from the upstream repository (
git checkout main
) - Create a new branch off with a descriptive name (for example:
feature/new-test-macro
,bugfix/bug-when-uploading-results
). You can do it with (git checkout -b <branch name>
) - Make your changes and commit them locally (
git add <changed files>>
,git commit -m "Add some change" <changed files>
). Whenever you commit, the self-tests and code quality will kick in; fix anything that gets broken - Push to your branch on GitHub (with the name as your local branch:
git push origin <branch name>
). This will output a URL for creating a Pull Request (PR) - Create a pull request by opening the URL a browser. You can also create PRs in the GitHub interface, choosing your branch to merge into main
- Wait for comments and respond as-needed
- Once PR review is complete, your code will be merged. Thanks!!
- Write helpful commit messages
- Anything in your branch must have no failing tests. You can check by looking at your PR online in GitHub
- Never use
git add .
: it can add unwanted files; - Avoid using
git commit -a
unless you know what you're doing; - Check every change with
git diff
before adding them to the index (stage area) and withgit diff --cached
before committing; - If you have push access to the main repository, please do not commit directly
to
dev
: your access should be used only to accept pull requests; if you want to make a new feature, you should use the same process as other developers so your code will be reviewed.
- Use PEP8;
- Write tests for your new features (please see "Tests" topic below);
- Always remember that commented code is dead code;
- Name identifiers (variables, classes, functions, module names) with readable
names (
x
is always wrong); - When manipulating strings, we prefer either f-string
formatting
(f
'{a} = {b}'
) or new-style formatting ('{} = {}'.format(a, b)
), instead of the old-style formatting ('%s = %s' % (a, b)
); - You will know if any test breaks when you commit, and the tests will be run again in the continuous integration pipeline (see below);
The quick start instructions and self-tests require demo data in the data db. This can be downloaded from Google drive.
To upload new demo data ...
- Run the ingestion (see main README)
- In the data directory,
tar -cvf datadb-<DATE>.tar ./datadb
thengzip datadb-<DATE>.tar
- Upload file to this folder
- Edit
data/download_demo_data.py
to use file URL
To download demo data ...
docker compose stop datadb
cd data && python3 download_demo_data.py && cd ..
docker compose start datadb
First, you will need to build the environment to include Prompt Flow ...
docker compose -f docker-compose.yml -f docker-compose-dev.yml up -d --build
Then ...
-
Install the DevContainers VSCode extension
-
Build data recipes using the
docker compose
command mentioned above -
Open the command palette in VSCode (CMD + Shift + P on Mac; CTRL + Shift + P on Windows) and select
Dev Containers: Attach to remote container
.Select the promptflow container. This opens a new VSCode window - use it for the next steps.
-
Install Promptflow add-in
-
Open folder
/app
-
Click on
flow.dag.yaml
-
Top left of main pane, click on 'Visual editor'
-
On the Groundedness node, select your new connection
-
You can no run by clicking the play icon. See Promptflow documentation for more details
As mentioned in the main README, the assistant can be used with openapi standard API, such as the included HDX API. To add another, extend the configuration in iningestion/ingest.config. The ingestion script will process this data and import into the Data Recipes AI database. This works for simple APIs with relatively low data volumes, and may need some adjustment depending on the complexity of the API.
Some APIs are too extensive to ingest. These can be defined as tools (functions) for the assistant, which can query the API on request to get data. See assistants/recipes_agents/create_update_assistant.py which already has a couple of functions which integrate with APIs which you could extend for new data sources.
As mentioned in the main README, the assistant can be provided your data in the form of data files (eg CSV, Excel) and documents (eg PDF and word). These are available to the assistant for all interactions. Additionally, users can upload files during conversation. In both cases analysis is done by the LLM assistant and should be treated with caution.
We will add more details here soon, for now, here are some notes on Azure ...
A deployment script './deployment/deploy_azure.py' is provided to deploy to an Azure Multicontainer web app you have set up with these instructions. The script is run from the top directory. Note: This is for demo purposes only, as Multicontainer web app are still in Public Preview.
To run the deployment ...
python3 ./deployment/deploy_azure.py
One thing to mention on an Azure deploy, it doesn't get pushed to the web app sometimes, until a user tries to access the web app's published URL. No idea why, but if your release is 'stuck', try this.
Note:
./deployment/./deployment/docker-compose-azure.yml
is the configuration used in the deployment center screen on the web app./deployment/./deployment/docker-compose-deploy.yml
is the configuration used when building the deploymentdocker-compose.yml
is used for building locally
You will need to set key environment variables, see your local .env
for examples.