Question Answering system using a Pre-trained Bert based model 💬

Question-answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document.
Some question-answering models can generate answers without context! (Hugging Face)

Types of Question Answering Models

Extractive Question Answering
Open Generative Question Answering
Closed Generative Question Answering

Our system is the Extractive Question Answering system, which means that you have a context and a question to ask, and the model assumes that the answer is inside the context provided.

QA-system interface using `Gradio` 🚀:

Dataset Overview

The question-answering system in this project is evaluated using the Stanford Question Answering Dataset (SQUAD).
SQUAD is a widely used benchmark dataset for evaluating machine reading comprehension and question-answering systems.
The SQUAD dataset contains a diverse set of passages from a variety of topics and genres.

Each example in the dataset consists of the following components:

Context Paragraph: A corpus that contains the information from which the answer can be extracted.
Question: A question related to the context, formulated to prompt the model to extract the relevant answer.
Answer Span: The exact span of text within the context paragraph that serves as the answer to the question.

The dataset can be found here: SQuAD link

Setup Instructions

Requirements:

Python version => 3.6 is recommended
Operating System: Windows

Python Packages Required:

datasets==2.14.4
numpy==1.24.4
pandas==2.0.3
tokenizers==0.13.3
torch==2.0.1
transformers
import torch
pytest

To use the question-answering system, follow these steps:

Clone the source

git clone https://github.com/geehaad/Question-Answering.git

Go to the directory you cloned the repo in, open cmd:

cd Question-Answering

Create a virtual environment (replace venv with your virtual environment name):
- Using conda, in CMD write:
```
conda create -p venv python==3.8
```
Activate the virtual environment:
```
conda activate venv\
```
Run the main script:
```
python src/components/main.py
```
The output is a CSV file called 'output' in your directory.

To run the testing file:

pytest src/tests/test_answer_questions.py

Model and Question Answering System

Model used:
- Model name: 'distilbert-base-cased-distilled-squad' - a variant of the DistilBERT model that has been fine-tuned specifically for the SQuAD. This model is designed to accurately extract answers from a given context.
How the System Works:
- Tokenization: The system takes a question and a context as input, then the context paragraph and question are tokenized into subword tokens using the tokenizer provided by the Hugging Face Transformers library of AutoModelForQuestionAnswering.
- Process input through the model: The tokenized inputs are passed through the distilbert-base-cased-distilled-squad model. This model has been fine-tuned on the squad dataset.
- Extract the answer span: The model's output consists of logits (probabilities) for each token in the context paragraph. The tokens with the highest start and end logits are to the beginning and end of the answer span within the context.
- Generating Answer: By decoding the answer span tokens, we generate the final answer string. This answer is then returned as the output of the system.
- Evaluate the model: using the first 100 rows of the squad dataset to evaluate the performance of the QAS.
- Testing: By using pytest with multiple test cases.

Project Directory Structure

The project directory is organized in a structured manner to facilitate easy navigation.
Below is an overview of the key folders and files within the project:

Question-Answering/
|-- notebooks/
|   |-- trails.ipynb
|-- src/
|   |-- __init__.py
|   |-- components/
|   |   |-- __init__.py
|   |   |-- helper.py
|   |   |-- main.py
|   |-- tests/
|   |   |-- __init__.py
|   |   |-- test_answer_questions.py
|-- requirements.txt
|-- README.md

How Files Are Used

src/: This folder contains the main source code of the project which are:

components/: The heart of the project, where the primary functionality resides, and contains:
- helper.py: This file contains the core functions that enable the question-answering system, These functions are:
  - The answer_questions function takes a context and a question as input, tokenizes, and extracts answers using the chosen model.
  - The apply_answer_questions function applies the answer_questions function to a dataset, generating dictionaries containing the question, original answer, and detected answer.
- main.py: The entry point of the project, where the main function utilizes the apply_answer_questions function on a subset of the dataset, 100 rows and saves the results in a CSV file.
tests/:
- test_answer_questions.py: Contains pytest test cases that validate the accuracy of the question answering system, the function in this file uses parameterized testing to check the behavior of the answer_questions function in different situations.

notebooks/:

trails.ipynb: The Jupyter notebook trails.ipynb is a sandbox for experimentation. It's used to explore the dataset and try different models before integrating them into the main system.

requirements.txt: Lists the Python packages required for the project to run successfully.

README.md: The central documentation file containing essential information about the project, its usage, and directory structure.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
QA-env		QA-env
__pycache__		__pycache__
data		data
img		img
notebooks		notebooks
prototype		prototype
src		src
.gitattributes		.gitattributes
README.md		README.md
output.csv		output.csv
output2.csv		output2.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Question Answering system using a Pre-trained Bert based model 💬

Types of Question Answering Models

QA-system interface using `Gradio` 🚀:

Dataset Overview

Each example in the dataset consists of the following components:

Setup Instructions

Requirements:

Python Packages Required:

To use the question-answering system, follow these steps:

Model and Question Answering System

Project Directory Structure

How Files Are Used

About

Releases

Packages

Languages

geehaad/Question-Answering

Folders and files

Latest commit

History

Repository files navigation

Question Answering system using a Pre-trained Bert based model 💬

Types of Question Answering Models

QA-system interface using Gradio 🚀:

Dataset Overview

Each example in the dataset consists of the following components:

Setup Instructions

Requirements:

Python Packages Required:

To use the question-answering system, follow these steps:

Model and Question Answering System

Project Directory Structure

How Files Are Used

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

QA-system interface using `Gradio` 🚀:

Packages