Big Data Computing project

Overview

This notebooks are part of the project for the Big Data Computing AY2020/21 course teached by Prof. Gabriele Tolomei

Resources:

the notebook with the training parts (also published on DataBricks)
a demo notebook without the training parts and with the evaluation, search engine and adversarial attack phases ( also published on DataBricks)
a brief presentation of this project ( also on Google Docs)
preprocessed dataset’s folder on Google Drive

The project has been done by:

Romeo Lanzino, matricola 1753403, email [email protected]
Federico Fontana, matricola 1744946, email [email protected]

Task

We have decided to tackle semi-supervised multi-class images classification, consisting of a dataset with both labeled and (a large amount of) unlabeled samples [Van Engelen, Hoos, 2019, A survey on semi-supervised learning]

Dataset

We've chosen STL-10 [Coates, Lee, Ng, 2011, An Analysis of Single Layer Networks in Unsupervised Feature Learning], which is an image recognition dataset with a corpus of 100K unlabeled images, 5K labeled training images and 8K labeled test images, covering 10 different classes.

Workflow

download, analyze and preprocess the dataset
contrast the curse of dimensionality using dimensionality reduction techniques such as a CNN
train a model (such as a MLP) on the labeled training images
pseudo-label the unlabeled images using the model trained in the previous step
train a second model using also the pseudo-labeled images
evaluate the results of both models to see if there have been some improvements
implement an image search engine that, given a query image as input, returns a list of relevant images in a gallery (disjointed from the query images set)
perform an example of adversarial attack

How to run the notebooks

Check instructions.md

Results

Classification

Base model

{'accuracy': 0.669375,
 'f1': 0.6717551753303268,
 'mcc': 0.6335526776740902,
 'weightedFalsePositiveRate': 0.036736111111111115,
 'weightedPrecision': 0.680093714461409,
 'weightedRecall': 0.669375,
 'weightedTruePositiveRate': 0.669375}
 
              precision    recall  f1-score   support

    airplane       0.74      0.81      0.77       800
        bird       0.68      0.69      0.68       800
         car       0.85      0.82      0.84       800
         cat       0.54      0.47      0.50       800
        deer       0.64      0.62      0.63       800
         dog       0.42      0.59      0.49       800
       horse       0.66      0.62      0.64       800
      monkey       0.66      0.53      0.59       800
        ship       0.86      0.76      0.81       800
       truck       0.76      0.78      0.77       800

    accuracy                           0.67      8000
   macro avg       0.68      0.67      0.67      8000
weighted avg       0.68      0.67      0.67      8000

Final model

{'accuracy': 0.660375,
 'f1': 0.6616909977888196,
 'mcc': 0.6229632711502258,
 'weightedFalsePositiveRate': 0.03773611111111111,
 'weightedPrecision': 0.6654451885476704,
 'weightedRecall': 0.660375,
 'weightedTruePositiveRate': 0.660375}
 
              precision    recall  f1-score   support

    airplane       0.73      0.74      0.74       800
        bird       0.67      0.67      0.67       800
         car       0.86      0.80      0.83       800
         cat       0.53      0.49      0.51       800
        deer       0.61      0.64      0.63       800
         dog       0.44      0.54      0.48       800
       horse       0.64      0.64      0.64       800
      monkey       0.64      0.53      0.58       800
        ship       0.79      0.80      0.79       800
       truck       0.75      0.76      0.75       800

    accuracy                           0.66      8000
   macro avg       0.67      0.66      0.66      8000
weighted avg       0.67      0.66      0.66      8000

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
images		images
.gitignore		.gitignore
README.md		README.md
instructions.md		instructions.md
notebook_demo.ipynb		notebook_demo.ipynb
notebook_training.ipynb		notebook_training.ipynb
presentation.pdf		presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Computing project

Overview