C++ transcripts of the Caffe2 Python tutorials and other C++ example code.
Caffe2 has a strong C++ core but most tutorials only cover the outer Python layer of the framework. This project aims to provide example code written in C++, complementary to the Python documentation and tutorials. It covers verbatim transcriptions of most of the Python tutorials and other example applications.
Some higher level tools, like brewing models and adding gradient operations are currently not available in Caffe2's C++. This repo therefore provides some model helpers and other utilities as replacements, which are probably just as helpful as the actual tutorials. You can find them in include/caffe2/util and src/caffe2/util.
Check out the original Caffe2 Python tutorials at https://caffe2.ai/docs/tutorials.html.
- Intro Tutorial
- Toy Regression
- Pre-trained models
- MNIST from scratch
- RNNs and LSTM
- ImageNet Classifiers
- Fast Retrain
- Training from scratch
- Deep Dream
-
Install dependencies
Install the dependencies CMake, leveldb and OpenCV. If you're on macOS, use Homebrew:
brew install cmake glog protobuf leveldb opencv eigen
On Ubuntu:
apt-get install cmake libgoogle-glog-dev libprotobuf-dev libleveldb-dev libopencv-dev libeigen3-dev curl
In case you're using CUDA an run into CMake issues with
NCCL
, try adding this to your.bashrc
(assuming Caffe2 at$HOME/caffe2
):export CMAKE_LIBRARY_PATH=$CMAKE_LIBRARY_PATH:$HOME/caffe2/third_party/nccl/build/lib
-
Install Caffe2
Follow the Caffe2 installation instructions: https://caffe2.ai/docs/getting-started.html
-
Build using CMake
This project uses CMake. However easiest way to just build the whole thing is:
make
Internally it creates a
build
folder and runs CMake from there. This also downloads the resources that are required for running some of the tutorials.Check out the Build alternatives section below if you wish to be more involved in the build process.
Note: sources are developed and tested on macOS and Ubuntu.
The Intro Tutorial covers the basic building blocks of Caffe2. This tutorial is transcribed in intro.cc.
Make sure to first run make
. Then run the intro tutorial:
./bin/intro
This should output some numbers, including a loss
of about 2.2
.
One of the most basic machine learning tasks is linear regression (LR). The Toy Regression tutorial shows how to get accurate results with a two-parameter model. This tutorial is transcribed in toy.cc
Run the toy regression model:
./bin/toy
This performs 100 steps of training, which should result in W after
approximating W ground truth
.
Often training can be skipped by using a pre-trained model. The Model Zoo contains a few of the popular models, although many are only available for Caffe. Use caffe_translator.py to convert models to Caffe2. See Caffe2 Models for more info.
The Loading Pre-Trained Models tutorial shows how to use these models to classify images. This tutorial and more is covered in pretrained.cc. The code takes an input image and classifies its content. By default it uses the image in res/image_file.jpg
. Make sure the pre-trained Squeezenet model is present in res/squeezenet_*_net.pb
. Note that
To run:
./bin/pretrained
This should output something along the lines of 96% 'daisy'
.
To classify giraffe.jpg
:
./bin/pretrained --file giraffe.jpg
This tutorial is also a good test to see if OpenCV is working properly.
To export a model from Python:
model = model_helper.ModelHelper(..)
with open("init_net.pb", 'wb') as f:
f.write(model.param_init_net._net.SerializeToString())
with open("predict_net.pb", 'wb') as f:
f.write(model.net._net.SerializeToString())
See also:
A classical machine learning dataset is the MNIST database of handwritten digits by Yann LeCun. The Caffe2 tutorial MNIST - Create a CNN from Scratch shows how to build a basic convolutional neural network (CNN) to recognize these handwritten digits. This tutorial is transcribed in mnist.cc. Note that this and following tutorials rely on utility functions defined in caffe2/util.
Make sure the databases folders res/mnist-*-nchw-leveldb
are present. These should be generated by the download_resource.sh script. Then run:
./bin/mnist
This performs 100 training runs, which should provide about 90% accuracy.
To see the training in action, run with --display
:
./bin/mnist --display
After testing the trained model is stored in tmp/mnist_init_net.pb
and tmp/mnist_predict_net.pb
. For an example implentation of how to use this trained model, take a look at predict_example()
in mnist.cc. This implementation is "pure" Caffe2 and does not rely on any helper methods. For an implementation that does use helper methods, take a look at imagenet
.
In The Unreasonable Effectiveness of Recurrent Neural Networks Andrej Karpathy describes how to train a recurrent neural network (RNN) on a large volume of text and how to generate new text using such a network. The Caffe2 tutorial RNNs and LSTM Networks covers this technique using the char_rnn.py script.
This tutorial is transcribed in rnn.cc. It takes the same parameters as used in the tutorial. First make sure the file res/shakespeare.txt
is present. Then run:
./bin/rnn
In contrast to the tutorial, this script terminates after 10K iterations. To get more, use --iters
:
./bin/run --iters 100000
To get better results (loss < 1), expand the hidden layer:
./bin/rnn --iters 100000 --batch 32 --hidden_size 512 --seq_length 32
The file res/dickens.txt
contains a larger volume of text. Because the writing is a bit more recent, it's more challenging to generate convincing results. Also, single newlines are stripped to allow for more creativity.
./bin/rnn --iters 100000 --batch 32 --hidden_size 768 --seq_length 32 --train_data res/dickens.txt
After 200K runs, the loss has not dropped below 36, in contrast to the shakespeare text. Perhaps this requires an additional hidden layer in the LSTM model.
Much of the progress in image recognition is published after the yearly ImageNet Large Scale Visual Recognition Challenge (ILSVRC). This competion is based on the ImageNet dataset, which is a large volume of labeled images of nearly everything. The models for this challenge form the basis of much image recognition and processing research. One of the most basic challenges is classifying an image, which is covered in this example.
To classify the content of an image, run:
./bin/imagenet --model resnet101 --file res/image_file.jpg
Where the model name is one of the following:
alexnet
: AlexNetgooglenet
: GoogleNetsqueezenet
: SqueezeNetvgg16
andvgg19
: VGG Teamresnet50
,resnet101
,resnet152
: MSRAmobilenet
,mobilenet50
,mobilenet25
: MobileNet
The pre-trained weights for these models are automatically downloaded and stored in the res/
folder. If you wish to download all models in one go, run:
./script/download_extra.sh
Additional models can be made available on request!
To classify an image using a model that you trained yourself, specify the location of the init and predict .pb
file including a %
character. For example:
./bin/imagenet --model res/mobilenet_%_net.pb --file res/image_file.jpg
See also:
The article DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition describes how to get good results on new datasets with minimal training efforts by reusing trained parameters of an existing model. For example, the above models are all trained on ImageNet data, which means they will only be able to classify ImageNet labels. However, by retraining just the top half of the model we can get high accuracy in a fraction of the time. If the image data has similar characteristics, it's possible to get good results by only retraining the top 'FC' layer.
First divide all images in subfolders with the label a folder name. Then to retrain the final layer of GoogleNet:
./bin/train --model googlenet --folder res/images --layer pool5/7x7_s1
The script starts out by collecting all images and running them through the pre-trained part of the model. This allows for very fast training on the pre-processed image data.
If you have more (GPU) power at your disposal retrain VGG16's final 2 layers:
./bin/train --model vgg16 --folder res/images --layer fc6
Add --display
for training visualization:
./bin/train --model googlenet --folder res/images --layer pool5/7x7_s1 --display
Some models, like SqueezeNet require reshaping of their output to N x D tensor:
./bin/train --model squeezenet --folder res/images --layer fire9/concat --reshape
You can also provide your own pre-trained model. Specify the location of the init and predict .pb
file including a %
character:
./bin/train --model res/googlenet_%_net.pb --folder res/images --layer pool5/7x7_s1
After the test runs, the model is saved in the --folder
under the name _<layer>_<model>_<init/predict>_net.pb
. Please note that you'll need to specify the generated class .txt file by using the --classes
flag. You can now use this model like any other, for example in the imagenet
example:
./bin/imagenet --model res/images/_pool5_7x7_s1_googlenet_%_net.pb --file res/images/dog/Tjoise.jpg --classes res/images/_pool5_7x7_s1_classes.txt
Another example implementation of a classifier can be found in mnist.cc, see predict_example()
.
See also:
- How to Retrain Inception's Final Layer for New Categories
- Train your own image classifier with Inception in TensorFlow
To fully train an existing image classification model from scratch, run without the --layer
option:
./bin/train --model resnet50 --folder res/images
The models currently available for training are the ones listed in the ImageNet section. This will take a lot of time even when runnning on the GPU.
Some models, like SqueezeNet require reshaping of their output to N x D tensor:
./bin/train --model squeezenet --folder res/images --reshape
The article Inceptionism: Going Deeper into Neural Networks describes how hidden layers of a CNN can be visualized by training just the the input tensor based on the mean value of a particular channel. This technique is known for producing remarkable imagery and can be combined with existing images. Ths is referred to as Deep Dream.
NB: this code is still a bit buggy and generated images are not representative of the original Deep Dream implementation.
The 139th channel in the inception_4d/3x3
layer in GoogleNet:
./bin/dream --model googlenet --layer inception_4d/3x3_reduce --channel 139
The resulting image will be written to tmp/
. To visualize the process, add --display
:
./bin/dream --model googlenet --layer inception_4d/3x3_reduce --channel 139 --display
NB: these images differ from the ones presented in Google's article.
Multiple channels can be rendered in parallel by increasing the batch size:
./bin/dream --model googlenet --layer inception_4d/3x3_reduce --channel 133 --display --batch 11
If you have more (GPU) power at your disposal, the first channel in conv3_1
layer in VGG16:
./bin/dream --model vgg16 --layer conv3_1 --channel 0
You can also provide your own pre-trained model. Specify the location of the init and predict .pb
file including a %
character:
./bin/dream --model res/squeezenet_%_net.pb --layer fire9/concat --channel 100 --display
We can also do some dreaming on MNIST. First train the MNIST model. Then run:
./bin/dream --model tmp/mnist_%_net.pb --layer conv2 --size 28 --channel 0 --batch 50 --display
See also:
Some of the examples have a --display
option, which will show an OpenCV window with images and plots covering the training progress. These graphs are drawn using the cvplot framework.
See http://rpg.ifi.uzh.ch/docs/glog.html for more info on logging. Try running the tools and examples with --logtostderr=1
, --caffe2_log_level=1
, and --v=1
.
The easiest way to build all sources is to run:
make
To run these steps manually:
mkdir -p build
cd build
cmake ..
make
cd ..
./script/download_resource.sh
Compiling the tutorials and examples individually can be a little more involved. One way to get more understanding of what CMake does internally is by running:
cd build
make VERBOSE=1
cd ..
The first three tutorials intro
, toy
, and pretrained
can be compiled without CMake quite easily. For example pretrained
on macOS:
c++ src/caffe2/binaries/pretrained.cc -o bin/pretrained -std=gnu++11 -Iinclude -I/usr/local/include/eigen3 -I/usr/local/include/opencv -lgflags -lglog -lprotobuf -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_imgcodecs -lCaffe2_CPU
And pretrained
on Ubuntu:
c++ src/caffe2/binaries/pretrained.cc -o bin/pretrained -std=gnu++11 -Iinclude -I/usr/include/eigen3 -I/usr/include/opencv -lgflags -lglog -lprotobuf -lopencv_core -lopencv_imgproc -lopencv_highgui -lCaffe2_CPU
Other examples require the compilation of additional .cc
files. Take a look at the verbose output of cd build && make VERBOSE=1
.