Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hidden factors paper demo #13

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Hidden factors paper demo #13

wants to merge 1 commit into from

Conversation

benanne
Copy link
Member

@benanne benanne commented Aug 22, 2015

This is an updated version of the paper "Discovering hidden factors of variation in deep networks" by @briancheung et al. (http://arxiv.org/abs/1412.6583), which no longer relies on any external code. The previous version relied on an old version of the MNIST example.

The other notebook I did (Highway Networks) is coming soon as well :)

@f0k
Copy link
Member

f0k commented Aug 23, 2015

Looks good in general! Some comments (the diff is too long for github to display, so I'll comment here):

y_train=T.cast(theano.shared(y_train), 'int32')

Why don't you cast the data before turning it into a shared variable? I used np.uint8 in Lasagne's MNIST example so they're GPU compatible.

l_encoder1 = nn.layers.DenseLayer(l_in, num_units=num_hidden_units)

I wouldn't officially encourage to leave out the nonlinearity=... specification. It's less clear.

batch_size = 100
build_model(..., batch_size=batch_size)

You set the batch size to 100, but in the end you compile compute_z to process the full training set at once. That's playing with fire! You should pass batch_size=None to build_model, in case someone takes your Recipe and tries to add dropout or something.

iter_train = ...

Those pesky names again :) But it's your choice, I don't mind too much.

We'll make use of this to 'clamp' the observed and latent representation variables.

Cool, I like that!

Before In[20], you could add some explanation on what you're doing. Something like "We'll now generate two images of nine rows of digits: One has z1 varied from -.3 to .3 with z2 fixed to zero, the other has z1 fixed to zero and z2 varied."

And just to make it a little prettier, you could alter the numbers so In[11] is followed by In[12] ff. instead of In[20] (not crucial).

Very nice work!

@benanne
Copy link
Member Author

benanne commented Aug 30, 2015

Why don't you cast the data before turning it into a shared variable? I used np.uint8 in Lasagne's MNIST example so they're GPU compatible.

This carries over from the Theano deep learning tutorials, where they also have the labels as float and use T.cast to convert them to integers at runtime: http://deeplearning.net/tutorial/logreg.html#logreg

This is because supposedly you can only store floats on the GPU (the labels would be stored in host memory otherwise). Are you saying this is not the case? (no longer the case maybe?)

I wouldn't officially encourage to leave out the nonlinearity=... specification. It's less clear.

I don't think stuff in Recipes counts as 'official' in that sense though :) But fair enough, I can add them explicitly. In that case we should probably consider updating all Recipes content to match, I'm pretty sure there are already a few other examples that rely on rectify as the default.

You set the batch size to 100, but in the end you compile compute_z to process the full training set at once. That's playing with fire! You should pass batch_size=None to build_model, in case someone takes your Recipe and tries to add dropout or something.

It's all dense layers, so I figured it wouldn't be a problem. But you're right in that it doesn't really set a good example. I'll change it.

Those pesky names again :) But it's your choice, I don't mind too much.

What's wrong with these names? I think they are there because I was mimicing the old MNIST example code. What should I call them instead?

Before In[20], you could add some explanation on what you're doing. Something like "We'll now generate two images of nine rows of digits: One has z1 varied from -.3 to .3 with z2 fixed to zero, the other has z1 fixed to zero and z2 varied."

Good call.

And just to make it a little prettier, you could alter the numbers so In[11] is followed by In[12] ff. instead of In[20] (not crucial).

Will take care of that when everything else has been addressed.

@f0k
Copy link
Member

f0k commented Sep 1, 2015

This is because supposedly you can only store floats on the GPU (the labels would be stored in host memory otherwise). Are you saying this is not the case? (no longer the case maybe?)

I thought so. But you're right:

In [3]: type(theano.shared(np.zeros(1)))
Out[3]: theano.tensor.sharedvar.TensorSharedVariable

In [4]: type(theano.shared(np.zeros(1, dtype='float32')))
Out[4]: theano.sandbox.cuda.var.CudaNdarraySharedVariable

In [5]: type(theano.shared(np.zeros(1, dtype='uint8')))
Out[5]: theano.tensor.sharedvar.TensorSharedVariable

In [6]: type(theano.tensor.cast(theano.shared(np.zeros(1, dtype='float32')), 'int32'))
Out[6]: theano.tensor.var.TensorVariable

In the MNIST example it was different, there I just needed something I can hand to a compiled function, and int8 was good for that. So it should stay the way it is! (Maybe add a short comment if you feel like it.)

In that case we should probably consider updating all Recipes content to match, I'm pretty sure there are already a few other examples that rely on rectify as the default.

Hmm, you're right, the spatial transformer network has it explicitly specified for the classification network, but not for the localization network. The caffe/CIFAR10 example doesn't specify it. Well... maybe just leave it as it is.

What's wrong with these names?

Maybe it's just me, but they remind me of Python's iter and so I think they'd return an iterable or are generator functions, i.e., something you'd put in the head of a for loop, not the body. Lasagne/Lasagne#215 (comment) :)

I think they are there because I was mimicing the old MNIST example code.

Exactly! It doesn't die! :)

What should I call them instead?

For me, train_step() would fit better, or just train_fn() as in the new MNIST example (indicating that it's a compiled function, although I guess that's not terribly obvious either). But as I said, I don't feel strongly on those names, choose whatever suits you best!

@benanne
Copy link
Member Author

benanne commented Sep 26, 2015

I will try to finish this soon, just a bit occupied with other things at the moment. I haven't abandoned this PR!

@briancheung
Copy link

Let me know if there's anything I can help with to get this PR merged :)

@benanne
Copy link
Member Author

benanne commented Oct 31, 2015

just f0k's comments that need to be addressed. I should have some time to sort this out soon, but feel free to do it if you want this to be merged sooner :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants