shakes76 · dapmiller · Oct 11, 2022 · Oct 11, 2022 · Oct 14, 2022 · Oct 14, 2022
diff --git a/recognition/Miller/README.MD b/recognition/Miller/README.MD
@@ -0,0 +1,53 @@
+# Vector Quantized Variational Auto-encoder(VQ VAE Model)
+
+In this report, a generative model of the Vector Quantized Variational AutoEncoder (VQ VAE) was used to generate reconstructed images of the OASIS brain data set that are "reasonably clear" and have a Structured Similarity (SSIM) of over 0.6. The VQ VAE was adapted using tensorflow keras.
+
+#### Description of VQ VAE Algorithm
+![](https://miro.medium.com/max/1400/1*yRdNe3xi4f3KV6ULW7yArA.png)
+>Figure 1: Graphical representation of a VQ-VAE network.
+
+A standard VAE (encoder->decoder) uses a continous latent space that is sampled using gaussain distribution; this makes it hard to learn a continuous distribution with a gradient descent. In comparison, VQ VAE uses a discrete latent space; and consists of three parts as seen above:
+
+1. Encoder:
+    * Convolutional network to downsample the features of an image
+2. Latent Space:
+    * Codebook consists of n latent embedding vectors of dimension D each
+    * Each code represents the distance between each embedding and encoded output (euclidean distance) ->outputs embeded vector
+    * feed closest encoder output to codebook as input to decoder
+3. Decoder:
+    * Convolutional network to upsample and gnerate reconstructed samples.
+
+#### ==============Oasis Brain Data Set==============
+![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRl7czOsj3uzWRQ6NT2ofed7QBsKiqrUq6Bsw&usqp=CAU)
+>Figure 2: Comparison of an image stored in the train vs test data sets
+
+The Oasis MRI Dataset cobtains 9664 training images, 544 test images and 1120 validation images.  An example of train and test data is shown above. The images are preloaded into a file location and from there extracted into processing for use.
+
+##### Data Pre-Processing
+
+Before the data was used, it was normalised through residual extration and rescaling. This makes it easier to compare the distributions with different means and scales to maintain the shape of the distribution. 
+
+## ==============Training==============
+
+The three data groups - train, test, and validate are split 0.85/0.1/0.05. The training set contains the most images so the model has enough information to learn from to produce accurate reconstructions later. The test set is used to validate these reconstructions. The validation set is not required, as the model is judged by the quality of the reconstructons on the test set. The model is trained with ... epochs on a batch size of 128.
+*insert image
+
+## ==============Results==============
+
+The reconstructed images achieved a mean Structured Similarity of ...
+*Inerset image
+## Dependencies
+* Python 3.7
+* TensorFlow 2.6.0
+* Numpy 1.19.5
+* matplotlib 3.2.2
+* Pillow 7.1.2
+* os
+* Pre-processed OASIS MRI dataset (accessible at https://cloudstor.aarnet.edu.au/plus/s/n5aZ4XX1WBKp6HZ/download).
+
+## References
+[1] A. v. d. Oord, O. Vinyals, and K. Kavukcuoglu, 2018. Neural Discrete Representation Learning. [Online]. Available at: https://arxiv.org/pdf/1711.00937.pdf.
+
+[2] Paul, S., 2021. Keras documentation: Vector-Quantized Variational Autoencoders. [online] Keras.io. Available at: https://keras.io/examples/generative/vq_vae/.
+
+[3] https://github.com/shakes76/PatternFlow/tree/master/recognition/MySolution
diff --git a/recognition/Miller/dataset.py b/recognition/Miller/dataset.py
@@ -0,0 +1,113 @@
+"""
+dataset.py" containing the data loader for loading and preprocessing your data
+
+This was file utilises and modifies the fucntions found in https://github.com/shakes76/PatternFlow/tree/master/recognition/MySolution 
+"""
+
+import tensorflow as tf
+import glob
+import numpy as np
+from matplotlib import image
+import os
+from PIL import Image
+
+
+# Download the Oasis Data as zip file. Will need to extract it manually afterwards
+def download_oasis ():
+
+    dataset_url = "https://cloudstor.aarnet.edu.au/plus/s/n5aZ4XX1WBKp6HZ/download"
+
+    # Download file from URL Path, origin=path, fname=file name, untar=compress file
+    tf.keras.utils.get_file(origin=dataset_url,fname='oa-sis' ,untar=True)
+
+# Loads the training images (non segmented) from given path and returns an numpy array of arrays
+def load_training (path):
+
+    image_list = []
+    # Iterate through all paths and convert to 'png'
+    for filename in glob.glob(path + '/*.png'): 
+        # Read an image from the given filename into an array
+        im = image.imread (filename)
+        # Append array to list
+        image_list.append(im)
+
+    print('train_X shape:', np.array(image_list).shape)
+
+    # Create an numpy array to hold all the array turned images
+    train_set = np.array(image_list, dtype=np.float32)
+
+
+    return train_set
+
+# Normalizes training images and adds 4th dimention 
+def process_training (data_set):
+
+    """ Residual Extraction -> Useful for comparing distributions with different means but similar shapes"""
+    # Calculate the residuals of the data - each residual is dist from each distribution mean which is now zero
+    data_set = (data_set - np.mean(data_set)) / np.std(data_set)
+    """ Min-Max Rescaling -> Useful for comparign distributions with different scales or different shapes"""
+    # Rescale Data - ratio of dist of each value from min value in each dataset to range of values in each dataset -> value between (0,1) now
+    # Forces dataset to be same scale, and perseves shape of distribution -> "Squeezed and shifted to fit between 0 and 1"
+    data_set= (data_set - np.amin(data_set)) / np.amax(data_set - np.amin(data_set))
+    # Add 4th dimension
+    data_set = data_set [:,:,:,np.newaxis]
+
+    return data_set
+
+# Loads labels images from given path and map pixel values to class indices and convert image data type to unit8 
+def load_labels (path):
+    image_list =[]
+
+    # Iterate through all paths and convert to 'png'
+    for filename in glob.glob(path+'/*.png'): 
+        # Read an image from the given filename into an array
+        im=image.imread (filename)
+        # Create 'im.shape[0] x im.shape[1]' shaped array of arrays of zeros
+        one_hot = np.zeros((im.shape[0], im.shape[1]))
+        # Iterate through sorted and unique arrays of given array turned image
+        for i, unique_value in enumerate(np.unique(im)):
+            # One hot each unique array with its numerical value of its entry in the dataset -> transform categorical into numerical dummy features
+            one_hot[:, :][im == unique_value] = i
+        # Append array to list
+        image_list.append(one_hot)
+
+    print('train_y shape:',np.array(image_list).shape)
+
+    # Create an numpy array to hold all the array turned images
+    labels = np.array(image_list, dtype=np.uint8)
+
+    #pyplot.imshow(labels[2])
+    #pyplot.show()
+
+    return labels
+
+# One hot encode label data and convert to numpy array
+def process_labels (seg_data):
+    onehot_Y = []
+
+    # Iterate through all array turned images by shapes first value
+    for n in range(seg_data.shape[0]): 
+
+        # Get data at position in array
+        im = seg_data[n]
+
+        # There are 4 classes
+        n_classes = 4
+
+        # Create 'im.shape[0] x im.shape[1] x n_classes' shaped array of arrays of arrays of zeros with type uint8
+        one_hot = np.zeros((im.shape[0], im.shape[1], n_classes),dtype=np.uint8)
+
+        # Iterate through sorted and unique arrays of given array turned image
+        for i, unique_value in enumerate(np.unique(im)):
+            # One hot each unique array with its numerical value of its entry in the dataset -> transform categorical into numerical dummy features
+            one_hot[:, :, i][im == unique_value] = 1
+            # Append array to list
+            onehot_Y.append(one_hot)
+
+    # Create an numpy array to hold all the array turned images
+    onehot_Y =np.array(onehot_Y)
+    #print (onehot_Y.dtype)
+    #print (np.unique(onehot_validate_Y))
+    #print (onehot_Y.shape)
+
+    return onehot_Y
diff --git a/recognition/Miller/modules.py b/recognition/Miller/modules.py
@@ -0,0 +1,197 @@
+"""
+“modules.py" containing the source code of the components of your model. Each component must be
+implementated as a class or a function
+
+Based on Neural Discrete Representation Learning by van der Oord et al https://arxiv.org/pdf/1711.00937.pdf 
+and the given example on https://keras.io/examples/generative/vq_vae/
+"""
+import tensorflow as tf
+
+"""CREATE STRUCTURE OF VQ-VAR MODEL"""
+
+"""
+Class Representation of the Vector Quantization laye
+
+Structure is: 
+    1. Reshape into (n,h,w,d)
+    2. Calculate L2-normalized distance between the inputs and the embeddings. -> (n*h*w, d)
+    3. Argmin -> find minimum distance between indices for each n*w*h vector
+    4. Index from dictionary: index the closest vector from the dictionary for each of n*h*w vectors
+    5. Reshape into original shape (n, h, w, d)
+    6. Copy gradients from q -> x
+"""
+class VectorQ_layer(tf.keras.layers.Layer):
+    def __init__(self, embedding_num, latent_dimension, beta=0.25, **kwargs):
+        super().__init__(**kwargs)
+        self.embedding_num = embedding_num
+        self.latent_dimension = latent_dimension
+        self.beta = beta
+
+        # Initialize the embeddings which we will quantize.
+        w_init = tf.random_uniform_initializer()
+        self.embeddings = tf.Variable(initial_value=w_init(shape=(self.latent_dimension, self.embedding_num), dtype="float32"),trainable=True,name="embeddings_vqvae",)
+
+    # Forward Pass behaviour. Takes Tensor as input 
+    def call(self, x):
+        # Calculate the input shape and store for later -> Shape of (n,h,w,d)
+        input_shape = tf.shape(x)
+
+        # Flatten the inputs to keep the embedding dimension intact. 
+        # Combine all dimensions into last one 'd' -> (n*h*w, d) 
+        flatten = tf.reshape(x, [-1, self.latent_dimension])
+
+        # Get code indices
+        # Calculate L2-normalized distance between the inputs and the embeddings.
+        # For each n*h*w vectors, we calculate the distance from each of k vectors of embedding dictionaty to obtain matrix of shape (n*h*w, k)
+        similarity = tf.matmul(flatten, self.embeddings)
+        distances = (tf.reduce_sum(flatten ** 2, axis=1, keepdims=True) + tf.reduce_sum(self.embeddings ** 2, axis=0) - 2 * similarity)
+
+        # For each n*h*w vectors, find the indices of closest k vector from dictionary; find minimum distance.
+        encoded_indices = tf.argmin(distances, axis=1)
+
+        # Turn the indices into a one hot encoded vectors; index the closest vector from the dictionary for each n*h*w vector
+        encodings = tf.one_hot(encoded_indices, self.embedding_num)
+        quantized = tf.matmul(encodings, self.embeddings, transpose_b=True)
+
+        # Reshape the quantized values back to its original input shape -> (n,h,w,d)
+        quantized = tf.reshape(quantized, input_shape)
+
+        """ LOSS CALCULATIONS """
+        """
+        COMMITMENT LOSS 
+            Since volume of embedding spcae is dimensionless, it may grow arbitarily if embedding ei does not
+            train as fast as encoder parameters. Thus add a commitment loss to make sure encoder commits to an embedding
+        CODE BOOK LOSS 
+            Gradients bypass embedding, so we use a dictionary learningn algorithm which uses l2 error to 
+            move embedding vectors ei towards encoder output
+
+            tf.stop_gradient -> no gradient flows through
+        """
+        commitment_loss = tf.reduce_mean((tf.stop_gradient(quantized) - x) ** 2)
+        codebook_loss = tf.reduce_mean((quantized - tf.stop_gradient(x)) ** 2)
+        self.add_loss(self.beta * commitment_loss + codebook_loss)
+        # Straight-through estimator.
+        # Unable to back propragate as gradient wont flow through argmin. Hence copy gradient from qunatised to x
+        # During backpropagation, (quantized -x) wont be included in computation anf the gradient obtained will be copied for inputs
+        quantized = x + tf.stop_gradient(quantized - x)
+
+        return quantized
+
+# Represents the VAE Structure
+class VAE:
+    def __init__(self, embedding_num, latent_dimension, beta=0.25):
+        self.embedding_num = embedding_num    
+        self.latent_dimension = latent_dimension
+        self.beta=beta
+    """
+    Returns layered model for encoder architecture built from convolutional layers. 
+
+    activations: ReLU advised as other activations are not optimal for encoder/decoder quantization architecture.
+    e.g. Leaky ReLU activated models are difficult to train -> cause sporadic loss spikes that model struggles to recover from
+    """
+    # Encoder Component
+    def encoder_component(self):
+        #2D Convolutional Layers
+        # filters -> dimesion of output space
+        # kernal_size -> convolution window size
+        # activation -> activation func used
+            # relu ->
+        # strides -> spaces convolution window moves vertically and horizontally 
+        # padding -> "same" pads with zeros to maintain output size same as input size
+        inputs = tf.keras.Input(shape=(256, 256, 1))
+
+        layer = tf.keras.layers.Conv2D(32, 3, activation="relu", strides=2, padding="same")(inputs)
+        layer = tf.keras.layers.Conv2D(64, 3, activation="relu", strides=2, padding="same")(layer)
+
+        outputs = tf.keras.layers.Conv2D(self.latent_dimension, 1, padding="same")(layer)
+        return tf.keras.Model(inputs, outputs, name="encoder")
+
+    # Returns the vq Layer
+    def vq_layer(self):
+        return VectorQ_layer(self.embedding_num, self.latent_dimension, self.beta, name="vector_quantizer")
+
+    """
+    Returns the model for decoder architecture built from  tranposed convolutional layers. 
+
+    activations: ReLU advised as other activations are not optimal for encoder/decoder quantization architecture.
+    e.g. Leaky ReLU activated models are difficult to train -> cause sporadic loss spikes that model struggles to recover from
+    """
+    # Decoder Component
+    def decoder_component(self):
+        inputs = tf.keras.Input(shape=self.encoder_component().output.shape[1:])
+        #2D Convolutional Transpose Layers
+        # filters -> dimesion of output space
+        # kernal_size -> convolution window size
+        # activation -> activation func used
+            # relu ->
+        # strides -> spaces convolution window moves vertically and horizontally 
+        # padding -> "same" pads with zeros to maintain output size same as input size
+        layer = tf.keras.layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same")(inputs)
+        layer = tf.keras.layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same")(layer)
+        outputs = tf.keras.layers.Conv2DTranspose(1, 3, padding="same")(layer)
+        return tf.keras.Model(inputs, outputs, name="decoder")
+
+    # Build Model
+    def build_model(self):
+        vq_layer = self.vq_layer()
+        encoder = self.encoder_component()
+        decoder = self.decoder_component()
+
+        inputs = tf.keras.Input(shape=(256, 256, 1))
+        encoder_outputs = encoder(inputs)
+        quantized_latents = vq_layer(encoder_outputs)
+        reconstructions = decoder(quantized_latents)
+        model = tf.keras.Model(inputs, reconstructions, name="vq_vae")
+        model.summary()
+        return model
+
+# Create a model instance and sets training paramters 
+class VQVAETRAINER(tf.keras.models.Model):
+    def __init__(self, variance, latent_dimension=32, embeddings_num=128, **kwargs):
+
+        super(VQVAETRAINER, self).__init__(**kwargs)
+        self.latent_dimension = latent_dimension
+        self.embeddings_num = embeddings_num
+        self.variance = variance
+
+        VAE_model = VAE(self.embeddings_num, self.latent_dimension)
+        self.vqvae_model = VAE_model.build_model()
+
+
+        self.total_loss_tracker = tf.keras.metrics.Mean(name="total_loss")
+        self.reconstruction_loss_tracker = tf.keras.metrics.Mean(name="reconstruction_loss")
+        self.vq_loss_tracker = tf.keras.metrics.Mean(name="vq_loss")
+
+    @property
+    def metrics(self):
+        # Model metrics -> returns losses (total loss, reconstruction loss and the vq_loss)
+        return [self.total_loss_tracker, self.reconstruction_loss_tracker, self.vq_loss_tracker]
+
+    def train_step(self, x):
+        with tf.GradientTape() as tape:
+            # Outputs from the VQ-VAE.
+            reconstructions = self.vqvae_model(x)
+
+            # Calculate the losses.
+            reconstruction_loss = (tf.reduce_mean((x - reconstructions) ** 2) / self.variance)
+            total_loss = reconstruction_loss + sum(self.vqvae_model.losses)
+
+        # Backpropagation.
+        grads = tape.gradient(total_loss, self.vqvae_model.trainable_variables)
+        self.optimizer.apply_gradients(zip(grads, self.vqvae_model.trainable_variables))
+
+        # Loss tracking.
+        """CODEBOOK LOSS + COMMITMENT LOSS -> euclidean loss + encoder loss"""
+        self.total_loss_tracker.update_state(total_loss)
+        """RECONSTRUCTION ERROR (MSE) -> between input and reconstruction"""
+        self.reconstruction_loss_tracker.update_state(reconstruction_loss)
+        self.vq_loss_tracker.update_state(sum(self.vqvae_model.losses))
+
+        # Log results.
+        return {
+            "loss": self.total_loss_tracker.result(),
+            "reconstruction_loss": self.reconstruction_loss_tracker.result(),
+            "vqvae_loss": self.vq_loss_tracker.result(),
+        }
+
+