LangProp is a framework of generating code using ChatGPT, and evaluate the code performance against a dataset of expected outputs in a supervised learning setting. Usually, ChatGPT generates code which is sensible but fails for some edge cases, and then you need to go back and prompt ChatGPT again with the error. This framework saves you the hassle by automatically feeding in the exceptions back into ChatGPT in a training loop, so that ChatGPT can iteratively improve the code it generates.
The framework works similarly to PyTorch Lightning. You prepare the following:
- a PyTorch-like Dataset that contains the input to the code and the expected output (ground-truth labels),
- a "model" definition of
setup.txt
andupdate.txt
where the setup and update prompts are defined, - a Trainer which defines the scoring metric, data preprocessing step, etc.
- The scoring metric can be as simple as the accuracy of the prediction, i.e.
float(result == labels)
. - The preprocessing step converts the items retrieved from the dataset into input-output pairs.
- The scoring metric can be as simple as the accuracy of the prediction, i.e.
conda create -n langprop python=3.7
conda activate langprop
pip install -r ./src/langprop/requirements.txt
Set .env
at the repository root to include the following.
export OPENAI_API_TYPE=open_ai
export OPENAI_API_BASE=https://api.openai.com/v1
export OPENAI_API_KEY=<YOUR API KEY>
export OPENAI_MODEL=gpt-3.5-turbo
export OPENAI_API_TYPE=azure
export OPENAI_API_BASE=https://eastus.api.cognitive.microsoft.com
export OPENAI_API_KEY=<YOUR API KEY>
export OPENAI_API_VERSION=2023-03-15-preview
export OPENAI_API_ENGINE=gpt_test
Override the call_llm
method in the LangAPI
class in ./src/langprop/lm_api.py.
We have prepared a very simple example of generating code that computes the factorial for a given integer input.
The model definition is in examples/factorials/model
, the factorial dataset in examples/factorials/dataset.py
, and the training code in examples/factorials/run.py
.
At the start, make sure you add src
to the PYTHONPATH
by running
export PYTHONPATH=./src/:${PYTHONPATH}
python ./src/langprop/examples/factorials/test_setup.py
You will find that a new checkpoint.txt
has been generated in the directory ./src/langprop/examples/factorials/
.
python ./src/langprop/examples/factorials/test_policy.py
python ./src/langprop/examples/factorials/test_run.py
The resulting code (which we call checkpoints) and the log of ChatGPT prompts and queries can be found in lm_logs
in the root directory.
This example solves Sudoku. Instead of solving the standard 3x3 puzzle, we solve a general sudoku that consists of W x H subblocks, each with H x W elements. Due to the complexity in the specification, an LLM would often fail on the first attempt, but using LangProp allows us to filter out incorrect results and arrive at a fully working solution.
At the start, make sure you add src
to the PYTHONPATH
by running
export PYTHONPATH=./src/:${PYTHONPATH}
This only has to be done once to generate the training data.
python ./src/langprop/examples/sudoku/generate.py
python ./src/langprop/examples/sudoku/test_run.py
The resulting code (which we call checkpoints) and the log of ChatGPT prompts and queries can be found in lm_logs
in the root directory.
- Example prompts.
- Example checkpoint.
- Incorrect solution generated zero-shot
- Correct solution after LangProp training
This example solves CartPole-v1
in openai gym (now part of gymnasium). Initially the LLM generates solutions which are simplistic and does not balance the CartPole.
With a simple monte carlo method of optimizing the policy for the total rewards, we can obtain improved policies using LangProp.
At the start, make sure you add src
to the PYTHONPATH
by running
export PYTHONPATH=./src/:${PYTHONPATH}
python ./src/langprop/examples/cartpole/test_run.py
The training code uses utility functions defined in gym_utils.py.
Here is a sample video of the training result:
The resulting code (which we call checkpoints) and the log of ChatGPT prompts and queries can be found in lm_logs
in the root directory.
To automatically generate and parse prompts with new inputs, outputs and exceptions on the fly, we have developed a simple but flexible custom template engine, which can be found at prompt.py
.
Here are the key features of the template engine.
- Every line that begins with "#" is treated as comments.
- Every line that begins with "$ " or in between "$begin" and "$end" are treated as executable python code, as well as everything inside {{ }}.
- You can define a system prompt, a user prompt and an assistant prompt by inserting a header of
%% SYSTEM
,%% USER
, or%% ASSISTANT
. - You can import another prompt file as
$import path_to_file.txt
, either by giving a relative path or an absolute path.