This repository provides scripts for training and performing inference using the PaliGemma model. The model is designed for visual question answering (VQA) tasks. The scripts were made by our team "Attack On Python".
Given the images of online products on Amazon with various measurements of physical quantities (e.g., height, width, weight) specified, extract the numerical values corresponding to the physical quantities given as input.
Our solution achieved a maximum F1-Score of 0.661
and secured a 26th position finish (link) among over 2000 participating teams.
Ensure you have the following dependencies installed:
Install the dependencies by running:
pip install -r requirements.txt
- Download the images in a directory by passing the list of links from train.csv to
util.download_images(<list of link of images>)
- Add the path of the images directory in the
data_dir
and the path to train.csv incsv_filename
- Run the
PaliGemma_Training_AttackOnPython.py
file after setting up theNUM_EPOCHS
andBATCH_SIZE
hyperparameter
- Download the images in a directory by passing the list of links from test.csv to
util.download_images(<list of link of images>)
- Add the path of the images directory in the
data_dir
and the path to train.csv in metadata_df'spd.read_csv(<test.csv path>)
- Run the
PaliGemma_Training_AttackOnPython.py
file after setting up thebatch_size
andtest_id
hyperparameter.
A shout-out to my amazing friends Anurakt, Vibhu and Shivanshu for the great work!