L2 Regularization for all layers #131

aayux · 2018-01-04T11:13:27Z

It is better practice to apply weight decay methods in each layer. This commit proposes to change the current implementation of L2 Regularization from just the output layer to the hidden (convolution) layers and also omits regularization of the bias at the output as an unnecessary overhead.

Seeking review and testing.

It is better practice to apply weight decay methods in each layer. This commit proposes to change the current implementation of L2 Regularization from just the output layer to the hidden (convolution) layers and also omits regularization of the bias at the output as an unnecessary overhead. Seeking review and testing.

passerbythesun · 2018-03-08T15:05:35Z

text_cnn.py

            self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")
            self.predictions = tf.argmax(self.scores, 1, name="predictions")

+        # Calculate L2 Regularization
+        l2_loss = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables() if "b" not in v.name])


This will also add embedding weights into l2_loss, is it intended?

It isn't! Thanks for pointing this out.

Following change should fix it I think:

l2_loss = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables()[1:] if "W" in v.name])

dennybritz · 2018-03-09T00:46:06Z

Thanks. This looks good in principle, have you tested it?

aayux · 2018-03-10T09:01:37Z

@dennybritz: I have only tested it on a small subset of the data. The program itself runs smoothly. However, I have not been able to do any meaningful performance/accuracy evaluations against your implementation because of a resource constraint. I can make a few citations to justify that this method works better in theory.

This StackOverflow answer verifies that the implementation is indeed correct and something of a standard, as similar usage can be found elsewhere.

Discussion on (not) regularizing bias can be found here and in this chapter of the Deep Learning book (p. 226).

chiragnagpal · 2018-03-14T19:32:26Z

Weight Decay with Adam is not a good idea. You'll end up adding another hyperparameter with little performance gain.

aayux · 2018-03-15T07:30:30Z

@chiragnagpal yes, weight decay is largely ineffective when used with Adam. I'm only suggesting a modification on what is currently implemented.

We could possibly rethink the entire implementation as proposed in Fixing Weight Decay Regularization in Adam (Loshchilov, Hutter; 2018).

chiragnagpal · 2018-03-15T14:29:01Z

That paper is too new with hardly any citations.

It would be unwise to change from a standard implementation to a more esoteric one, especially given that we do not have any numbers on the performance improvement.
Tensorflow itself may add weight decay to the optimizers, like pytorch and dynet do. Why bother with doing it explicitly just now, given that it might break again in the future.

aayux · 2018-03-15T15:55:34Z

Sure. It's only something to think about.

aayux added 2 commits January 4, 2018 16:42

Update text_cnn.py

a900598

passerbythesun reviewed Mar 8, 2018

View reviewed changes

Update text_cnn.py

6632ebb

Update text_cnn.py

65c2862

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L2 Regularization for all layers #131

L2 Regularization for all layers #131

aayux commented Jan 4, 2018

passerbythesun Mar 8, 2018

aayux Mar 8, 2018

dennybritz commented Mar 9, 2018

aayux commented Mar 10, 2018

chiragnagpal commented Mar 14, 2018

aayux commented Mar 15, 2018

chiragnagpal commented Mar 15, 2018

aayux commented Mar 15, 2018

L2 Regularization for all layers #131

Are you sure you want to change the base?

L2 Regularization for all layers #131

Conversation

aayux commented Jan 4, 2018

passerbythesun Mar 8, 2018

Choose a reason for hiding this comment

aayux Mar 8, 2018

Choose a reason for hiding this comment

dennybritz commented Mar 9, 2018

aayux commented Mar 10, 2018

chiragnagpal commented Mar 14, 2018

aayux commented Mar 15, 2018

chiragnagpal commented Mar 15, 2018

aayux commented Mar 15, 2018