Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmarks, and issue with RNN #108

Closed
jrobinson01 opened this issue Dec 30, 2017 · 10 comments
Closed

benchmarks, and issue with RNN #108

jrobinson01 opened this issue Dec 30, 2017 · 10 comments

Comments

@jrobinson01
Copy link

I'm trying to write some better examples for RNN and LSTM, but ran into some snags and decided to start very simply. In short, my LSTM networks were taking forever to train in my projects, so I went back to basics and created this benchmark:

const brain = require('brain.js');
const trainingData = [
  {input: [0,0], output: [0]},
  {input: [0,1], output: [1]},
  {input: [1,0], output: [1]},
  {input: [1,1], output: [0]}
];

// NN xor
const net = new brain.NeuralNetwork();
let now = Date.now();
let output = net.train(trainingData);
console.log('NN: trained output:', output);
console.log(`in ${Date.now() - now} ms`);
console.log('NN test [0,0]:[0]', Math.round(net.run([0,0])));
console.log('NN test [0,1]:[1]', Math.round(net.run([0,1])));
console.log('NN test [1,0]:[1]', Math.round(net.run([1,0])));
console.log('NN test [1,1]:[0]', Math.round(net.run([1,1])));

// RNN xor
now = Date.now();
const rnn = new brain.recurrent.RNN();
output = rnn.train(trainingData);
console.log('RNN: trained output:', output);
console.log(`in ${Date.now() - now} ms`);
console.log('RNN test [0,0]:[0]', rnn.run([0,0]));
console.log('RNN test [0,1]:[1]', rnn.run([0,1]));
console.log('RNN test [1,0]:[1]', rnn.run([1,0]));
console.log('RNN test [1,1]:[0]', rnn.run([1,1]));

// LSTM xor
const lstm = new brain.recurrent.LSTM();
now = Date.now();
output = lstm.train(trainingData);
console.log('LSTM: trained output:', output);
console.log(`in ${Date.now() - now} ms`);
console.log('LSTM test [0,0]:[0]', lstm.run([0,0]));
console.log('LSTM test [0,1]:[1]', lstm.run([0,1]));
console.log('LSTM test [1,0]:[1]', lstm.run([1,0]));
console.log('LSTM test [1,1]:[0]', lstm.run([1,1]));

The output is interesting (2015 MacBook pro):

NN: trained output: { error: 0.004995326394090512, iterations: 4116 }
in 23 ms
NN test [0,0]:[0] 0
NN test [0,1]:[1] 1
NN test [1,0]:[1] 1
NN test [1,1]:[0] 0
RNN: trained output: { error: 1.6177478953504192, iterations: 20000 }
in 5994 ms
RNN test [0,0]:[0] 0
RNN test [0,1]:[1] 1
RNN test [1,0]:[1] 1
RNN test [1,1]:[0] 1
LSTM: trained output: { error: 1.4159965869501594, iterations: 20000 }
in 26377 ms
LSTM test [0,0]:[0] 0
LSTM test [0,1]:[1] 1
LSTM test [1,0]:[1] 1
LSTM test [1,1]:[0] 0
  • RNN sometimes gets them all right, but often times does not.
  • Both RNN and LSTM have what look to me like high error values

Any idea what I might be doing wrong, or could do to improve their accuracy? I realize not much can be done for performance (in node) atm and that is fine. I don't mind waiting for them to train if I can get reasonable error values out of them, and of course, be able to trust that I'm doing it right.

@robertleeplummerjr
Copy link
Contributor

First off, ty ty ty for the help! After you train the nets, have your ran them against a valid output and seen what the result is?

@jrobinson01
Copy link
Author

Yeah I believe so. The last 4 lines of each test are running the respective network. They’re basically the same as the examples from the current readme.

@robertleeplummerjr
Copy link
Contributor

Ah yes, an oversight on my part. I was playing Mario with my kids, and took a moment to answer. Shame on me :).

TL;DR
The brain.NeuralNetwork class uses "mse"
The brain.recurrent.* class uses "mRmsProp"
The numbers are two different measurements, the recurrent are higher, I'm sure there is a way to normalize them, I just have not had time. I believe error values you have are plenty fine, and you could probably reduce the iterations to a couple hundred, and get fairly good results.
There will be no curt answers in the TS;DR.

TS;DR
I don't normally response as verbosely, but here it goes.

The standard feedforward neural net (brain.NeuralNetwork) uses a different means of training than the recurrent neural networks (brain.recurrent.*). Much of what I started with when building the current neural network was recurrenjs, but it bothered me how cryptic the internals were, and I thought: "Hey, build it like brain.js is built!". This was greadily to enlighten myself what the various terms meant, whilst building a fairly solid library.

It seems in mathematics, and in the popular neural networks, the smaller you write something, the more successful you are... While at the same time, I feel very strongly that is rudely curt. Why? In a way the one practicing this curt lingo is in a sense "encrypting" the various ideologies behind their work. They take perfectly valid ideas and reduce the terminology to that of something you'd have to go to Harvard to understand. Terms like: w (in brain.js as weights) and dw (deltaWeights, in brain.js as deltas) have been avoided (there are many more) and the whole library, re-thought out to be far simlper and to get us closer to a library that can run via a graphic processor (which will land partially in v1 as experimental here, careful, its experimental!), which generally perform a factor or so faster then javascript, but in some cases multiple factors (also in some cases slower).

Sorry about the "rant", but what I'm coming to is that the error output is different. The recurrent neural net uses "Momentum Root Mean Squared Error Propagation" found here, generally referred to as "momentum with rmsprop" of each network output added together for each iteration through the library (all inputs are tested, their error calculated and summed) and then we divide the sum by the count of training samples we were given.

Brain.js' NeuralNetwork, on the other hand, uses Mean Squared Error found here and used here (uses a fixed momentum, not sure if that can be called momentum).

The difference between the values is that they don't represent the same measurement, Momentum Root Mean Squared Error Propagation is generally higher, while Mean Squared Error is generally far lower.

Many of these terms I would not have known going into this project, and it has been both a headache and one of the greatest adventures I've ever had as a programmer. Much like climbing a mountain, learning the guitar, or running a 40k, It has been very fulfilling.

I'm trying to write some better examples for RNN and LSTM

This line alone makes very happy, and I look very much forward to your first PR, and will do what I can to help.

Call to action to anyone reading this:
Much of my work has been in the GPGPU that the v2 architecture will be running, GPU.js and setting up the architecture for harnessing it here. There is still much to be done to make javascript a viable solution for enterprise and hobbyists, and I've spent the better part of over a year trying to achieve.

Node and Javascript is one of the largest development communities, it is just about the numbers. Build a bridge, and they will cross.

It would have been a whole lot easier to just copy and paste and translate without understanding the actual underlying code (which many other most libraries have done) but, much like I did to my frustrated elementary school teachers, I will keep asking "why", and finding the answers, even if it feel like climbing a mountain. Too, I'm honestly thinking of my kids, or rather the next generation of developers. I don't want them to have to dig as hard to find these simple answers. I don't want them to use mystical tools to build with. Neural Networks are very simple, they should be demystified. We should be using them to build better things. Rather than imagining them taking over the world, and living in fear, use them to grow vegetable, or cure cancer, or play more Mario.

@jrobinson01
Copy link
Author

jrobinson01 commented Dec 30, 2017

Thanks so much for that! We should use your post as a start to the wiki. Makes sense about the error value. I will continue to experiment with LSTM's and RNN's in hopes that we can come up with some more useful documentation. The goals of this project, and the sentiment of your post really resonate with me so I'm excited to help. Neural networks ARE simple but it seems nobody can explain them to someone who's never built one.

edit: I had thought of a few questions while reading your post, and then forgot to ask them. I think the answers to the first couple will help me answer the rest I have in my head.

Do the RNN and LSTM networks use the same default construction options as the NeuralNetwork detailed here?
https://github.com/BrainJS/brain.js#options-1

Also, do they use the same defaults when training?
https://github.com/BrainJS/brain.js#options

If the answer to both of those is yes, do you think they're sensible, or should they be changed? For example, in your post you said that the higher error output of the example LSTM and RNN is normal. Should the default errorThreshold for those be changed then to something closer to 1.x? Same thing for iterations. If 100 (or whatever) is enough, are we wasting cycles with the default of 20000?

@robertleeplummerjr
Copy link
Contributor

robertleeplummerjr commented Dec 31, 2017

Do the RNN and LSTM networks use the same default construction options as the NeuralNetwork detailed here?

Close, they have their own:

RNN.defaults = {

and it shared for the other recurrent classes:

export default class LSTM extends RNN {

export default class GRU extends RNN {

Likely after v2 is released (v1 in a few days, probably any time) the brain.recurrent namespace will probably be deprecated because it was used as the bridge of both understanding, and designing the planned brain.Recurrent class (note: not namespace). If you look at some of the other issues that mention the planning of this class, it shares essentially the exact brain.FeedForward class api, which makes the ability to understand recurrent nets so so so much easier.

import { Recurrent, layer } from 'brain.js';
const { input, lstm, output } = layer;

const net = new Recurrent({
  inputLayer: () => input,
  hiddenLayers: [(input, recurrentInput) => lstm(input, recurrentInput)],
  outputLayer: (input) => output(input)
});

net.train();
net.run();

I might write a little of it tonight... but I digress.

Note: I say "planned" brain.Recurrent, which technically is partly incorrect, as all the layers and complete architecture will run on both FeedForward or Recurrent, so technically it has been partially implemented, but I really like unit tests, so foundation first, then syntactical sugar.

@jrobinson01
Copy link
Author

I think I follow! Thanks again! More on construction... Let's say I want a LTSM.. new recurrent.LSTM({ hiddenLayers: [?] });
I'm playing around with one now, but not having much luck. see issue #109 Given that I want it to generate sequences like "Jane saw Doug." "Doug saw Spot." what would be a good starting point for hidden layers?

@robertleeplummerjr
Copy link
Contributor

Come to think of it, I actually already composed the net layers. I call this "layer composition", and I really feel it is like the "ah! that is how they work!", moment when you see them. At least they are the most important part of the networks:

This is one of the main differences I've seen with brain.js and other libs. layers, or layers composed of layers, composed of layers, composed of layers, becomes very easy. So you could define a stochastic spiking long short term memory weather forecasting snow shoveling neural network with ease.

I'd like to see "them" name that net.
"Next in class we will be analysing the 'SSLSTMWFNSS' Neural Network, which, as you may have guessed, is actually quite simple"

@robertleeplummerjr
Copy link
Contributor

robertleeplummerjr commented Dec 31, 2017

Let's say I want a LTSM..

Nailed it.

Too, there is an input and output converter, which convert a single word, to a neuron index for inputs and outputs. So you can use plan english to train the net, and the net will do the hard work for you.

I'll take a look and see if I can get it running.

@robertleeplummerjr
Copy link
Contributor

Running: #109 (comment)

Have fun!

@jrobinson01
Copy link
Author

Closing. I think we’re good here. Looking forward to seeing v2!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants