Skip to content

About Restricted Boltzmann Machines

Juan Lao Tebar edited this page Jan 13, 2017 · 1 revision

Our implementation of the RBM is taken from https://github.com/basavin/rbm-smple/blob/master/rbm.py with the following changes:

  • It was turned into a class, located at Boltzmann.py.
  • It was parameterized so we can decide some techniques of training. Specifically, these techniques can be chosen:
  • Sigmoid or Tanh as activation function. At the moment, Sigmoid seems to give better results.
  • Number of training samples used at each step of the training. This allows to do batch, mini-batch and online training.
  • Weight decay can be set or not. It is an improvement of the Contrastive Divergence formula and an explanation can be found here https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf.
  • Momentum can be set or not. It is an improvement of the Contrastive Divergence formula and an explanation can be found here https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf.
  • Of course, the number of hidden neurons and the learning rate are also parameters.

Remember that RBMs are, in principle, associative memories for binary units. This does not mean that the theory used at Hopfield that transforms {0,1}-> {-1,1} is valid. In fact, it could be used if the formulas were changed accordingly, but this is unnecessary. At the moment, RBMs work with {0,1} binary inputs. This is not problematic, as generators have been also parameterized so we can choose the positive and negative value of the binary samples created.

A note about performance of RBM for the rest of the group

For the simple datasets used so far, I was able to find a configuration of the RBM that could reconstruct the samples to the nearest. This does not mean that it is going to be as easy, or even sometimes possible to do it. RBMs are very black-boxy for me right now. If you find trouble when executing results, try:

  1. If you are using Tanh, move to Sigmoid. I have not found any guarantee so far that tells me Tanh is appropriate for RBMs.
  2. Change things. There are a lot in RBMs, so start with the typical: number of hidden units, learning rate.
  3. If you are using momentum or weight decay, remove it. The default parameters do not apply them and, for the toy problems used so far, I could not find any improvement when using them or even a worse behaviour.
  4. If you are not using these methods, try using them because they are theoretically useful, and I believe that they work better with bigger datasets.
  5. Tell me to modify the RBM code so the weight matrix, the visible offsets vector and the hidden offsets vector initialization is also parameterized and initial values can be passed when constructing the RBM. I have seen how this is an absurdly critical parameter for the convergence of the RBM. The values found in the code that I borrowed make the example datasets work (that's why I left them fixed), and a variation that I tried made everything not converge, but maybe this changes with other datasets.
  6. Start drinking.

Resources used