Official repository for the paper - "Improving Planning With Large Language Models: A Modular Agentic Architecture."
- python 3.9.17
- numpy==1.24.3
- tqdm==4.65.0
- json==2.0.9
- openai==1.24.1
For ToH task, the codebase (inside toh
directory) contains files to run MAP and GPT-4 baselines such as zero-shot, in-context learning (ICL), chain-of-thought (CoT) with ICL, multi-agent debate (MAD), and tree of thought (ToT). It also contains files to evaluate the outputs generated by the model runs.
To run the above models you need to specify two required arguments- 1) openAI API key 2) directory name where output log files will be stored
For example to run and evaluate MAP first execute, python gpt4_map_toh.py --openai_api_key '<YOUR OPENAI KEY>' --output_dir '<OUTPUT DIRECTORY NAME>'
Then execute, python gpt4_map_toh_eval.py --output_dir '<OUTPUT DIRECTORY NAME>'
To run and evaluate one of the baseline models, for example, GPT-4 ICL first execute, python gpt4_icl_toh.py --openai_api_key '<YOUR OPENAI KEY>' --output_dir '<OUTPUT DIRECTORY NAME>'
Then execute, python gpt4_icl_toh_eval.py --output_dir '<OUTPUT DIRECTORY NAME>'
For the Cogeval tasks (Valuepath, Steppath, Reward Revaluation, and Detour), the codebase (inside cogeval
directory) contains files to run MAP and GPT-4 baselines such as zero-shot, in-context learning (ICL), chain-of-thought (CoT) with ICL, multi-agent debate (MAD), and tree of thought (ToT). It also contains files to evaluate the outputs generated by the model runs.
To run the above models you need to specify two required arguments- 1) openAI API key 2) directory name where output log files will be stored
For example to run and evaluate MAP on the Valuepath task first execute, python gpt4_map_valuepath.py --openai_api_key '<YOUR OPENAI KEY>' --output_dir '<OUTPUT DIRECTORY NAME>'
Then execute, python gpt4_map_valuepath_eval.py --output_dir '<OUTPUT DIRECTORY NAME>'
To run and evaluate one of the baseline models, for example, GPT-4 ICL on the Valuepath task first execute, python gpt4_standard_icl_valuepath.py --openai_api_key '<YOUR OPENAI KEY>' --output_dir '<OUTPUT DIRECTORY NAME>'
Then execute, python gpt4_valuepath_baselines_eval.py --output_dir '<OUTPUT DIRECTORY NAME>'
For the mystery blocksworld task, the codebase (inside planbench/mystery_blocksworld
directory) contains files to run MAP and GPT-4 baselines such as zero-shot, in-context learning (ICL), chain-of-thought (CoT) with ICL, and multi-agent debate (MAD). It also contains files to generate plan responses JSON for further evaluation.
First clone the LLMs-Planning repo, inside planbench/mystery_blocksworld
directory from https://github.com/karthikv792/LLMs-Planning, and the follow the instructions given in https://github.com/karthikv792/LLMs-Planning/tree/main/plan-bench for setup.
To run the above models you need to insert the following block of code at the start of the script after the import statements. Fill in the api_key
, azure_endpoint
, and deployment_name
. Also as a required argument specify the directory name where output log files will be stored
client = AzureOpenAI(
api_key="",
api_version="2024-02-01",
azure_endpoint=""
)
deployment_name = ''
For example to run MAP first execute, python gpt4_map_mystery_blocksworld_plan_generation.py --output_dir '<OUTPUT DIRECTORY NAME>'
Then to generate the plan response JSON execute, python gpt4_map_genplan_response_json.py --output_dir '<OUTPUT DIRECTORY NAME>'
Finally, to evaluate the plan response JSON execute, python LLMs-Planning/plan-bench/response_evaluation.py --task 't1' --config 'mystery_blocksworld' --engine 'map' --ignore_existing --verbose 'True'
To run one of the baseline models, for example, GPT-4 ICL first execute, python gpt4_mystery_blockworld_plan_generation_icl.py --output_dir '<OUTPUT DIRECTORY NAME>'
Then to generate the plan response JSON execute, python gpt4_baselines_genplan_response_json.py --output_dir '<OUTPUT DIRECTORY NAME>' --model 'gpt4_icl'
Finally, to evaluate the plan response JSON execute, python LLMs-Planning/plan-bench/response_evaluation.py --task 't1' --config 'mystery_blocksworld' --engine 'gpt4_icl' --ignore_existing --verbose 'True'
For the strategyQA task, the codebase (inside strategyQA
directory) contains files to run MAP and GPT-4 baselines such as chain-of-thought (CoT), and tree of thought (ToT).
To run the above models you need to insert the following block of code at the start of the script after the import statements. Fill in the api_key
, azure_endpoint
, and deployment_name
.
client = AzureOpenAI(
api_key="",
api_version="2024-02-01",
azure_endpoint=""
)
deployment_name = ''
For example to run MAP execute, python map_strategyqa.py