Skip to content

jamesj223/MLCBCC

Repository files navigation

Machine Learning Comic Book Cover Classifier

Overview

A series of scripts to use machine learning to find and extract covers from comic books.

Most comic book files will have the cover as the first page, but often they will have multiple covers. Sometimes these are included at the start, sometimes at the end, and sometimes they are spread throughout the book. The goal of this project is to be able to run it against a directory full of comic book files, and have it extract all of the covers, so that they can be used to generate a cool collage (see the examples section below)

Examples

Collages were built using John's Background Switcher

Added some manually selected non-cover pages as well to give it a bit more variety.

Example1 Example2 Example3

Step 1 - Feature Engineering

MLE_1_Feature_Engineering.py is the first main file. Given a folder, recursively search through it for comic files (cbr/cbz) and build out a feature set for each page/image in each file.

The features we are using are as follows:

  • File Name
  • Whether the file name contains "Variant"
  • Image Height
  • Image Width
  • Number of continuous horizontal black lines in the image
  • Number of continuous horizontal white lines in the image
  • Number of white pixels in the image
  • Number of black pixels in the image
  • OCR word count for the image
  • Whether the OCR found the word "Variant"
  • Whether the OCR found the word "Marvel"
  • Whether OpenCV thinks it saw the Marvel Logo,
  • OpenCV confident score it seeing the Marvel Logo

Output csv looks like this:

TrainingSet2

Step 2 - Classifier Testing and Comparison

MLE_2_Classifier_Testing_And_Comparison.py is the second main file. Given a training data set, split it 80:20 training:test, then run various different classifiers using those two sets and measure their performance.

Key metrics we are measureing are Accuracy, Precision, Recall, F1 and Logistic Loss.

The classifiers tested are:

The results of the tests looked like this:

Comparison3

Overall, GradientBoostingClassifier was found to be the best option for this use case.

Step 3 - Usage

MLE_3_Extract_Classify_Move.py is the third main file. It works as follows:

  • Given an input folder, recursively search through it, find and extract all comic files to separate directories, flatten them (renaming files to avoid conflicts)
  • Build feature set for each image
  • Load trained classifier, load featureset into pandas, iterate over pandas and apply classifier
  • Move covers to output folder and clean up temp directories.

Early Expirements

Additionally, there are individual files for some of the individual features from MLE 1 from early testing/troubleshooting, as well as some additional benchmarking stuff. Might be of some use to someone.

Misc Other Benchmarking

Comparison of different classifiers against earlier training sets

Comparison1 Comparison2

Examining computation/time cost of different feature types

FeatureCost1

About

Machine Learning Comic Book Cover Classifier

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages