The purpose of this project is to build a Face Recognition System using FaceNet/VGG16 on football players. The players are Steven Gerrard, Mo Salah, Christiano Ronaldo, Wayne Rooney and Messi.
For getting images to train the system, I scrapped Google Images and downloaded each of their faces.
The repository contains:
-
Datasets Directory - This directory holds our dataset - images split into train and test directories.
-
[Requirements.txt] (/requirements.txt) - Showing the necessary libraries and dependencies to run the notebook.
-
Files
- web-scraping-main.py - A python script to scrap data off Google Images based on search query
- image_downloader.py - A python script to download images and save them into our dataset directory
- app.py - A python script that loads image from a dataset, extracts faces from them and saves into a .npz file
- extractor.py - A python script that saves the embeddings of the extracted faces.
- faces_dataset.npz - A .npz file that contains the extracted faces.
- vgg16_face_embeddings.npz - A .npz file that contains the embeddings of the extracted faces.
- haarcascade_frontalface_default.xml - Viola Davis Algorithm to detect faces from an image/video.
- index.py - Python script to spin up a Streamlit Web App
-
Gitignore File - File to ignore files and directories from being pushed to the remote repository
-
README.md File - Guiding instructions for describing and running the project.
-
excel_links - Directory where we save the image urls to scrapped off the internet for our football celebrities saved in Excel format.
This section guides you on how to setup your environment and run the repository on your local environment.
Create a virtual environment to install and manage the libraries to isolate them from your global environments.
To create a virtual environment, run the command below on your terminal:
python -m venv 'myenv_name'
Disclaimer: This approach is recommended for Linux and Mac environments. There might be a different approach for setting up in Windows environments.
To activate your environment on linux or mac operating system. Run the command below on your terminal.
source /path/to/myenv_name/bin/activate
To activate your environment on a windows environment:
source \path\to\myenv_name\Scripts\activate
Once you're done working on the repository, REMEMBER to deactivate your virtual environment in order to separate your local project dependencies.
To deactivate your environment on a linux or mac operating system. Run the command below on your terminal:
deactivate
The important libraries used in this environment are:
- Pandas - Used for manipulation, exploration, cleaning and analyzing of your dataset.
- Numpy - Used for mathematical and statistical purposes often to prepare your dataset for machine learning
- OpenCV Python - Used to manipulate images in python
- Selenium - Used to test web browsers and scrap off information
- Keras
- Keras-facenet
- Tensorflow
The above listed libraries are the core ones used in the repository. However, during installation you'll notice other dependencies installed that enable to work as expected. They are highlighted on the requirement.txt file.
Ensure you are have a version python > 3 installed and running on your local environment in order to to be able to install the libraries and run the notebook. Afterwards, ensure the virtual environment you created above is active before running the installation commands below on your terminal.
To install the libraries run:
pip install -r requirements.txt
You can also install all the libraries by running:
pip install requirements.txt
The steps below will help you to run the project from scrapping the web for the football celebrity images to running it on a streamlit app.
This phase scraps google images using Selenium in a Chrome web driver and saves the image urls of the football celebrites in an excel file in the excel_links directory.
- Run the web-scrapping-main.py script
python image_extractor_scripts/web-scraping-main.py
Thereafter, we read the excel files using Pandas and loop through each of them to download the images. These images are downloaded in the dataset/images directory.
- Run the image_downloader.py script
python image_extractor_scripts/image_downloader.py
Here, we use the OpenCV library and the viola davis model to detect faces from your downloaded images which are saved in faces_dataset.npz compressed file.
- Run the app.py script
python image_face_detection/app.py
Thereafter use the vgg16 model to extract the embeddings for each of the images and save them in a embeddings.npz compressed file.
- Run the extractor.py script.
python image_face_detection/extractor.py
The third step entails training and evaluating a classifier model with the image embeddings. In this case we're using the SVM and the RandomForestClassifier to train the model. We afterwards save the model in a joblib format. This is done on this notebook
Open the notebook using jupyter lab by running the command below on your terminal:
jupyter lab
Once the notebook is open, train the model. It will save an enocder.joblib needed to convert data in machine readable format and a model.joblib to save the model that will be used in our web app to detect the celebrity based on a user's uploaded image.
For our model to detect the celebrity, we spin up a web application where users can upload a face and the app returns a prediction based on our classifier model detecting the name of the celebrity.
In your terminal, run:
streamlit run index.py
This spins up the app locally in your browser
To run any of the python scripts in your local environment, run the command below on your terminal:
python (path_to_file).py
The celebrity detection app has been deployed using Streamlit. You can access it using the URL below: