Skip to content

Localization of rings in synthecic images using a convolutional neural network (CNN). With localization we mean to predict the center of the circle in normalized coordinates relative to the center of the image.

Notifications You must be signed in to change notification settings

gerritschoe/2DSingleObjectLocalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2DSingleObjectLocalization

Objective: Localization of rings in synthetic images using a convolutional neural network (CNN). With localization we mean to predict the center of the circle in normalized coordinates relative to the center of the image.

The images shows the synthetically generated input into the CNN, the prediction (purple rectangle) and the ground truth (green rectangle) for the center of the red ellipse.

alt text alt text alt text alt text

We are using PIL (Python Imaging Library) to generate and modify the images and TensorFlow to construct and train the neural network.

Structure of the neural net that performs regression:

  • input_layer = tf.reshape(features["x"], [-1, 200, 300, 3])
  • conv1 = tf.layers.conv2d(inputs=input_layer, filters=4, kernel_size=[5, 5], padding="same", activation=tf.nn.relu)
  • pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
  • conv2 = tf.layers.conv2d(inputs=pool1, filters=8, kernel_size=[5, 5], padding="same", activation=tf.nn.relu)
  • pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
  • pool2_flat = tf.reshape(pool2, [-1, 75 * 50 * 8])
  • dropout = tf.layers.dropout(inputs=pool2_flat, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)
  • dense1 = tf.layers.dense(inputs=dropout, units=200, activation=tf.nn.relu)
  • dense2 = tf.layers.dense(inputs=dense1, units=20, activation=tf.nn.relu)
  • dense3 = tf.layers.dense(inputs=dense2, units=2, activation=None)
  • predictions = {"predict_results": tf.identity(dense3, name="final_layer")

Computation time: I ran this code on a laptop with a 2014 quadcore CPU, a mid-range mobile GPU (2GB VRAM) and 16 GB RAM. Time for 100 training steps: 51.750 seconds. Visually acceptable predictions are achieved after 2000 steps, good predictions are achieved after 6000 steps. A mean_squared_error of 0.01 is reachable.

Results: The current model shows good convergence and good prediction accuracy. Still, there is a lot of room for improvement, for example by switching to advanced localization algorithms like YOLOv3. A convolutional neural network is not able to achieve outstanding accuracy in regression tasks.

Usage:

There are 2 independent functionalities:

  1. Generating labeled data in the form of images which contain a single ellipse on a noisy background. For this, run the file generateData.py

  2. Train a CNN that predicts the center of the ellipse. For this, run trainCNN_tf.py

After a defined number of training steps, an evaluation with unseen test data follows. The predictions for the test data are visualized and saved to the test_output/ folder. The weights of the neural network are also saved to a defined folder and used as initialization for the next training, if available.

Author: Gerrit Schoettler Contact: gerrit.schoettler[at]tuhh.de

About

Localization of rings in synthecic images using a convolutional neural network (CNN). With localization we mean to predict the center of the circle in normalized coordinates relative to the center of the image.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages