CTC gradient seems to be constant. #2

mpezeshki · 2016-02-14T23:00:01Z

Could you please run file rnnctc.py in my forked repo:
https://github.com/mohammadpz/ctc/blob/master/examples/rnnctc.py

Variable gradsx2 should be twice larger than grads. But it's not the case
Am I missing something?

The text was updated successfully, but these errors were encountered:

sherjilozair · 2016-02-15T01:54:36Z

@mohammadpz, this is a known bug in my wrapper.

If you take a look at https://github.com/sherjilozair/ctc/blob/master/python/ctc.py#L91, you will notice that I completely ignore the output_grads, because of which, the cost works only when it's the last node. Any further operations and compositions are completely ignored.

I think the solution should be easy. I'll try to fix it and push it up. You are free to send in a PR with your solution as well.

mpezeshki · 2016-02-15T02:09:53Z

Oh, I see. You know, I want to divide the ctc cost by the batch size (which is variable). What's the quickest solution comes to your mind?
Thanks!

mpezeshki · 2016-02-15T02:13:01Z

Can I simply compute output_grads and multiply it by gradients = CPUCTCGrad()(*inputs)?
output_grads is also T.grad(last_output, ctc_cost).

sherjilozair · 2016-02-15T02:18:46Z

The quickest way would be to compute the grads, and then divide it by the batch size. That should work.

mpezeshki · 2016-02-15T02:20:46Z

I just replaced gradients = CPUCTCGrad()(*inputs) with gradients = output_grads * CPUCTCGrad()(*inputs) and the gradient is correct. Is there anything wrong with this?

sherjilozair · 2016-02-15T02:22:57Z

Isn't output_grads a list (of one Theano tensor) ?

mpezeshki · 2016-02-15T02:26:21Z

It is actually. In my case, I presume since it's just a division by batch size, the output_grads is [Elemwise{second}.0].

sherjilozair · 2016-02-15T02:27:25Z

Cool. I'll test this with some other examples, and push it in. Thanks!

mpezeshki · 2016-02-15T02:29:00Z

I'm not 100% sure, though!
Anyway, thanks for your nice job!

noammor · 2016-04-25T10:00:43Z

I'm trying to define a loss as the sum of two CTC losses. Do you have an insight as to how to complete the CTC theano.Op implementation? I'll try what's presented here and if it works I'll push it to my fork of the repository, and I'll very much appreciate if you have any more information or guidance since I'm not a Theano expert. Thanks!

ghost · 2016-10-23T19:01:34Z

I think that this would be correct:
gradients = output_grads[0].dimshuffle('x', 0, 'x') * CPUCTCGrad()(*inputs)

githubnemo mentioned this issue Aug 24, 2016

Operations on gradients are ignored mcf06/theano_ctc#9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CTC gradient seems to be constant. #2

CTC gradient seems to be constant. #2

mpezeshki commented Feb 14, 2016

sherjilozair commented Feb 15, 2016

mpezeshki commented Feb 15, 2016

mpezeshki commented Feb 15, 2016

sherjilozair commented Feb 15, 2016

mpezeshki commented Feb 15, 2016

sherjilozair commented Feb 15, 2016

mpezeshki commented Feb 15, 2016

sherjilozair commented Feb 15, 2016

mpezeshki commented Feb 15, 2016

noammor commented Apr 25, 2016

ghost commented Oct 23, 2016

CTC gradient seems to be constant. #2

CTC gradient seems to be constant. #2

Comments

mpezeshki commented Feb 14, 2016

sherjilozair commented Feb 15, 2016

mpezeshki commented Feb 15, 2016

mpezeshki commented Feb 15, 2016

sherjilozair commented Feb 15, 2016

mpezeshki commented Feb 15, 2016

sherjilozair commented Feb 15, 2016

mpezeshki commented Feb 15, 2016

sherjilozair commented Feb 15, 2016

mpezeshki commented Feb 15, 2016

noammor commented Apr 25, 2016

ghost commented Oct 23, 2016