ScenarioQA: Evaluating Test Scenario Reasoning Capabilities of Large Language Models

Overview

ScenarioQA is an in-depth test and evaluation of the capabilities of different LLMs using high-quality QA pairs under driving scenarios and conditions generated by GPT-4, which we picked due to it having comparable logical reasoning compared to other LLMs. We use the ontology to aid with question generation, along with information about the proper formation of MCQ questions as well as Bloom's taxonomy. Subsequently, we use the generated questions for each reasoning category in the LiteLLM prompt box to compare the results of several LLMs simultaneously.

Python Guidance Library

We used the Python Guidance library due to its ability to call many LLMs including GPT with more efficiency compared to conventional prompting and chaining methods.

All of the scripts for question generation are in the QA Template folder. To run them, follow the steps below:

Make sure you have the proper environmental variables set on your local PC. We are using GPT-4 for the guidance prompts so it was necessary to set the API key in the environmental variables to ensure the prompts run properly.
Install all the dependencies, you need Python Version 3.10 or above to run Guidance. For more information about Guidance check out the Guidance Library
```
pip install guidance
pip install sys
pip install re
pip install jason
```

Question Generation

After installing all the dependencies, we can then easily run guidance. All the files are Jypyter notebooks which can be run with Anaconda or VSCode. The prompt to generate questions has the following sections:

The Ontology
Directions on Proper MCQ Formation
Details about Bloom's Taxonomy
Directions for Question Generation for GPT
A Detailed Driving Scenario (if necessary)
Examples of Specific Types of Reasoning Tasks

If you would like to edit the questions outputted or ask for different results, you can modify the [Directions for Question Generation] section and prompt for better questions or more clarity, etc. By changing the directions, the ontology, as well as the scenario, you can create different results or outputs.

Ontology Creation

The ontology formation is a call to GPT multiple times after being given a seed ontology. More information can be found on the OpenXONTOLOGY site. Change the seed and the number of iterations in order to change the results of the ontology.

Scenario Creation

A very simple call to guidance using or modifying this scenario will generate high-quality detailed scenarios that can be used within the scope of this project.

PROMPT: Generate a scenario where a car crash occurred at an intersection between a car and a bike. The car was taking a right
at an intersection when the traffic light was red, however, the car did not check its blindspot and crashed into the 
cyclist. Provide details about the car, the cyclist, the type of road, and the weather.

LiteLLM

Additional directions can be found on the official LiteLLM website.

Install Dependencies:

pip install Flask
pip install litellm
pip install waitress

Run main.py
Run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LiteLLM		LiteLLM
Ontology_Generator		Ontology_Generator
QA_Template		QA_Template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScenarioQA: Evaluating Test Scenario Reasoning Capabilities of Large Language Models

Overview

Python Guidance Library

Question Generation

Ontology Creation

Scenario Creation

LiteLLM

About

Releases

Packages

Contributors 2

Languages

License

AugmentedDesignLab/ScenarioQA

Folders and files

Latest commit

History

Repository files navigation

ScenarioQA: Evaluating Test Scenario Reasoning Capabilities of Large Language Models

Overview

Python Guidance Library

Question Generation

Ontology Creation

Scenario Creation

LiteLLM

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages