Skip to content

Latest commit

 

History

History
80 lines (48 loc) · 4.38 KB

README.md

File metadata and controls

80 lines (48 loc) · 4.38 KB

PlaneSpotter

This repository contains:

  • A web-crawler (based on the Scrapy Python library) to download pictures of aircraft from various websites.
  • An OpenIMAJ project to train an image classifier with traditional Machine Learning technics (along with utility classes to process the output of the crawler).
  • A (more advanced) Theano script to train a Convolutional Neural Network (CNN) image.
  • A minimalist Python web server to host the CNN classifier

Instructions

Prerequisites

You will need to download and install Python 2.7, or preferably a scientific Python distribution, such as Anaconda.

Also required is the Java 1.8 Development Kit.

Image crawler

Install Scrapy:

pip install scrapy

Run the crawler (cd into planespotter/scrapy):

scrapy crawl airliners -o planes.json > log.txt

Note: This will download potentially millions of (small) pictures on your hard-drive, taking a lot of time. Performing this on a SSD will greatly speed-up the process

OpenIMAJ

In Eclipse, import the openimaj_classifier folder as an "Existing Project into Workspace".

Note: OpenIMAJ is based on Maven. The project has a great number of (probably unused) dependencies. So Maven will download a lot of libraries from the Internet to perform the first build (afterward it will be transparent).

There are three main classes in the project that you can run:

  • tk.thebrightstuff.JsonProcessor: This utility class takes as input one (or more) json files created by the crawler, and reformats them into one single text file (required by both OpenIMAJ and Theano).
  • tk.thebrightstuff.Sorter: This utility class processes an image folder created by the crawler (or a tar version of it) to create a more file-system-efficient folder structure (required by both OpenIMAJ and Theano).
  • tk.thebrightstuff.AircraftApp: This class trains the image annotator. Various inputs are required, such as the path were the image folder is stored on the disk, and how many pictures should be used for the training. After training, all the data is saved in a data.txt file (which can be reloaded later), and the classifier is tested against a set of pictures.

Theano

Install theano:

pip install theano

To train the CNN on you GPU (much more efficient), you also need to have a good NVidia graphic card, and install Cuda and g++. On Windows you will probably need to install Visual Studio (See this post for an example of setup).

Depending on your settings, you will need to customize the .theanorc file (in your home folder). An example is provided below (for Windows):

[global]
device = gpu
floatX = float32
exception_verbosity = high
compute_test_value = raise

[cuda]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5

[nvcc]
flags = --use-local-env  --cl-version=2013 -LC:\Users\niluje\Anaconda\Lib;
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64

Run the script (cd into theano_conv_net):

python Theano_aircraft.py

Note: The CNN will take a very long time to train, depending on your hardware, the size of the dataset and other settings that you can tune in the script.

Web server

The Theano script should save a model-values.save file inside webapp/results. You are ready to run the server!

Run the server (cd into webapp):

python server.py

Note: Depending on your setup you may need to run the server as an administrator.

Visit the web application at localhost.

Note: If you want to run the server on a separate computer, you will need to install Python and Theano as well on this computer.