Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTC gradient seems to be constant. #2

Open
mpezeshki opened this issue Feb 14, 2016 · 11 comments
Open

CTC gradient seems to be constant. #2

mpezeshki opened this issue Feb 14, 2016 · 11 comments

Comments

@mpezeshki
Copy link

Hi @sherjilozair ,

Could you please run file rnnctc.py in my forked repo:
https://github.com/mohammadpz/ctc/blob/master/examples/rnnctc.py

Variable gradsx2 should be twice larger than grads. But it's not the case
Am I missing something?

@sherjilozair
Copy link
Owner

@mohammadpz, this is a known bug in my wrapper.

If you take a look at https://github.com/sherjilozair/ctc/blob/master/python/ctc.py#L91, you will notice that I completely ignore the output_grads, because of which, the cost works only when it's the last node. Any further operations and compositions are completely ignored.

I think the solution should be easy. I'll try to fix it and push it up. You are free to send in a PR with your solution as well.

@mpezeshki
Copy link
Author

Oh, I see. You know, I want to divide the ctc cost by the batch size (which is variable). What's the quickest solution comes to your mind?
Thanks!

@mpezeshki
Copy link
Author

Can I simply compute output_grads and multiply it by gradients = CPUCTCGrad()(*inputs)?
output_grads is also T.grad(last_output, ctc_cost).

@sherjilozair
Copy link
Owner

The quickest way would be to compute the grads, and then divide it by the batch size. That should work.

@mpezeshki
Copy link
Author

I just replaced gradients = CPUCTCGrad()(*inputs) with gradients = output_grads * CPUCTCGrad()(*inputs) and the gradient is correct. Is there anything wrong with this?

@sherjilozair
Copy link
Owner

Isn't output_grads a list (of one Theano tensor) ?

@mpezeshki
Copy link
Author

It is actually. In my case, I presume since it's just a division by batch size, the output_grads is [Elemwise{second}.0].

@sherjilozair
Copy link
Owner

Cool. I'll test this with some other examples, and push it in. Thanks!

@mpezeshki
Copy link
Author

I'm not 100% sure, though!
Anyway, thanks for your nice job!

@noammor
Copy link
Contributor

noammor commented Apr 25, 2016

I'm trying to define a loss as the sum of two CTC losses. Do you have an insight as to how to complete the CTC theano.Op implementation? I'll try what's presented here and if it works I'll push it to my fork of the repository, and I'll very much appreciate if you have any more information or guidance since I'm not a Theano expert. Thanks!

@ghost
Copy link

ghost commented Oct 23, 2016

I think that this would be correct:
gradients = output_grads[0].dimshuffle('x', 0, 'x') * CPUCTCGrad()(*inputs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants