Highway Networks demo #14

benanne · 2015-08-22T21:48:31Z

Here's a revamped version of the notebook I did of "Highway Networks" by Srivastava et al. (2015): http://arxiv.org/abs/1505.00387

f0k · 2015-08-23T19:32:14Z

papers/Highway Networks.ipynb

+   "source": [
+    "def build_model(input_dim, output_dim, batch_size,\n",
+    "                num_hidden_units, num_hidden_layers):\n",
+    "    \"\"\"Create a symbolic representation of a neural network with `intput_dim`\n",


f0k · 2015-08-23T19:53:15Z

Nice and well-explained!

For now, I thought that "paper" Recipes should aim to faithfully reproduce results from a paper, i.e., use the same hyperparameters and achieve the same results. That's probably not always feasible, though. Still, to get a bit closer to the paper, you could follow the hints in Section 4: Transform gate biases were initialized to -2 for MNIST, not -4, and throughout the paper they use 50 hidden units, not 40. It would be cool to try reproducing Figure 2, or at least to add a final comment that you didn't try that but we'd welcome pull requests doing so.

benanne · 2015-08-23T20:18:57Z

I don't have much time to work on this right now (beyond cosmetic improvements) so I'm okay with moving it to 'examples' as well if you prefer.

f0k · 2015-08-23T20:56:46Z

I'm okay with moving it to 'examples' as well if you prefer.

I still think it fits in "papers" with the perspective of it reproducing Figure 2 sometime. Don't worry if you can't add a comment in the end, we can also add an Issue for that.

ebenolson · 2015-08-30T13:19:44Z

@benanne I'm going to ahead and merge this and #13 if you don't mind. I can make a PR later to address f0k's comments if you don't have time.

benanne · 2015-08-30T14:25:04Z

I was actually planning to have a look at it today. But I can do a new PR after you merge them as well, up to you.

ebenolson · 2015-08-30T14:38:26Z

If you're working on them further that's great, let's wait till you're
done. I just didn't want them to be stuck in limbo too long, they seem
quite good already.

On Sun, Aug 30, 2015 at 10:25 AM, Sander Dieleman [email protected]
wrote:

I was actually planning to have a look at it today. But I can do a new PR
after you merge them as well, up to you.

—
Reply to this email directly or view it on GitHub
#14 (comment).

benanne · 2015-08-30T14:42:19Z

Sure :) I'll try to sort out @f0k's remarks by tonight.

benanne · 2015-08-30T15:35:34Z

Just tried initializing the biases to -2.0 instead of -4.0, but this severely slows down convergence and seems to make things slightly unstable as well. So I'll stick with -4.0. It's no use trying to match the parameters against the paper's anyway, because a lot of them aren't even mentioned (learning rate etc.).

I will have a go at reproducing (a subset of) figure 2 though!

benanne · 2015-08-30T17:29:27Z

I tried changing some parameter values to match the paper better, but in the end I decided to leave most of them as they were because it just caused trouble :) Haven't gotten around to doing anything with the figure either, if anyone wants to do that in a separate PR feel free.

f0k · 2015-09-01T20:24:19Z

papers/Highway Networks.ipynb

+   "source": [
+    "**Now we can define a macro function to create a dense highway layer.** Note that it does not take a `num_units` input argument: the number of outputs should always be the same as the number of inputs, so it is redundant.\n",
+    "\n",
+    "We initialize the biases of the gates to `-4.0` to disable all of them initially. This means all layers will basically pass through the inputs (and gradients) unchanged at the start of training. In the paper, an initial value of `-2.0` is used for the MNIST experiments, but we found this to slow down convergence."


👍 for mentioning that!

f0k · 2015-09-01T20:29:15Z

It's no use trying to match the parameters against the paper's anyway, because a lot of them aren't even mentioned (learning rate etc.).

Yes, if it's difficult to find settings that work, we should just leave it at that. I didn't expect that to cause trouble. Thanks for trying and documenting it! Looks good to merge for me. Eben, you could add an Issue about reproducing Figure 2, maybe somebody is interested in trying that sometime.

highway networks implementation

32585cb

f0k reviewed Aug 23, 2015
View reviewed changes

changed some params to match the paper a bit better

ad5b93d

f0k reviewed Sep 1, 2015
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highway Networks demo #14

Highway Networks demo #14

benanne commented Aug 22, 2015

f0k Aug 23, 2015

f0k commented Aug 23, 2015

benanne commented Aug 23, 2015

f0k commented Aug 23, 2015

ebenolson commented Aug 30, 2015

benanne commented Aug 30, 2015

ebenolson commented Aug 30, 2015

benanne commented Aug 30, 2015

benanne commented Aug 30, 2015

benanne commented Aug 30, 2015

f0k Sep 1, 2015

f0k commented Sep 1, 2015

Highway Networks demo #14

Are you sure you want to change the base?

Highway Networks demo #14

Conversation

benanne commented Aug 22, 2015

f0k Aug 23, 2015

Choose a reason for hiding this comment

f0k commented Aug 23, 2015

benanne commented Aug 23, 2015

f0k commented Aug 23, 2015

ebenolson commented Aug 30, 2015

benanne commented Aug 30, 2015

ebenolson commented Aug 30, 2015

benanne commented Aug 30, 2015

benanne commented Aug 30, 2015

benanne commented Aug 30, 2015

f0k Sep 1, 2015

Choose a reason for hiding this comment

f0k commented Sep 1, 2015