Skip to content

Latest commit

 

History

History
68 lines (40 loc) · 5.7 KB

implementation-notes.md

File metadata and controls

68 lines (40 loc) · 5.7 KB

Implementation Notes

Mathematics

Two considerations were taken into account when the mathematical operations were implemented:

  • At the time of the implementation, the version of TensorFlow used (r0.11) lacks a lot regarding slicing and assigning values to slices.
  • A vectorized implementation is usually better than an Implementation with a loop (usually for the possible parallelism).

Most of the operations described in the paper lend can be straightforwardly implemented in TensorFlow, except possibly for two operations: the allocation weighting calculations, and the link matrix updates; as they both are described in a slicing and looping manner, which can make their current implementation look a little convoluted. The following attempts to clarify how these operations were implemented.

Allocation Weighting

In the paper, the allocation weightings are calculated using the formula:

which can be implemented naively with a loop with a runtime complexity of . There is no way to escape the slice-assignment operations, because the reordering the the sorted free-list back into its original places is crucial. However, there is a possibility to make things a little faster.

shifted_cumprod = tf.cumprod(sorted_usage, axis = 1, exclusive=True)
unordered_allocation_weighting = (1 - sorted_usage) * shifted_cumprod

allocation_weighting_batches = []
for b in range(self.batch_size):
    allocation_weighting = tf.zeros([self.words_num])
    unpacked_free_list = tf.unpack(free_list[b])
    for pos, original_indx in enumerate(unpacked_free_list):
        mask = tf.squeeze(tf.slice(self.I, [original_indx, 0], [1, -1]))
        allocation_weighting += mask * unordered_allocation_weighting[b, pos]
        allocation_weighting_batches.append(allocation_weighting)

return tf.pack(allocation_weighting_batches)

In this implementation, we calculate all the required products first on the sorted usage and get an unordered version of allocation weighting, then we use a loop to put back the weightings into their correct places. Because there is no differentiable way in TensorFlow to directly assign to slices of a tensor, a mask with 1 in the desired slice position and zeros else where is multiplied with the value to be assigned and added to the target tensor.

In this implementation, the loop is , but allows the exploitation of the possible parallelism of the cumprod operation, which could take down the runtime complexity down to , where is the number of parallel processors.

I'm not sure if TensorFlow implements a parallel version of cumprod but the implementation can exploit it if it's there.

Link Matrix

The paper's original formulation of the link matrix update is and index-by-index operation:

A vectorized implementation of this operation can be written as:

Where is elementwise multiplication, and is a pairwise addition operator defined as:

Where . This allows TensorFlow to parallelize the operation, but of course with a cost incurred on the space complexity.

The elementwise multiplication by is to ensure that all diagonal elements are zero, thus ensuring the elimination of self-links.

Weight Initializations

  • Controller weights are samples 1 standard-deviation away from a zero mean normal distribution with a variance , where is the size of the input vector coming into the weight matrix.

  • Memory's usage and precedence vectors and link matrix are initialized to zero as specified by the paper.

  • Memory's matrix, read and write weightings, and read vectors are initialized to a very small value (10⁻⁶). Attempting to initialize them to 0 resulted in NaN after the first few iterations.

These initialization schemes were chosen after many experiments on the copy-task, as they've shown the highest degree of stability in training (The highest ratio of convergence, and the smallest ratio of NaN-outs). However, they might re-consideration with other tasks.