You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I am trying to train cifar10 with the nn conv2d, and dense layers, and with initialization I got 7229.39 loss at the first step (and after a while this is just 5000). I am training with the same model architecture proposed in the weight norm article (Tim Salimans, Kingma). However with older nn dense and conv2d implementation this does not happen (also when I am not using the initialization). This is the implementation I use and got proper loss (around 2.3 at the first steps):
def conv2d(x, num_filters, filter_size=[3,3], pad='SAME', stride=[1,1], nonlinearity=None, init_scale=1., init=False, name=''):
with tf.variable_scope(name):
V = tf.get_variable('V', shape=filter_size+[int(x.get_shape()[-1]),num_filters], dtype=tf.float32,
initializer=tf.random_normal_initializer(0, 0.05), trainable=True)
g = tf.get_variable('g', shape=[num_filters], dtype=tf.float32,
initializer=tf.constant_initializer(1.), trainable=True)
b = tf.get_variable('b', shape=[num_filters], dtype=tf.float32,
initializer=tf.constant_initializer(0.), trainable=True)
if init: # normalize x
v_norm = tf.nn.l2_normalize(V,[0,1,2])
x = tf.nn.conv2d(x, v_norm, strides=[1] + stride + [1],padding=pad)
m_init, v_init = tf.nn.moments(x, [0,1,2])
scale_init=init_scale/tf.sqrt(v_init + 1e-08)
g = g.assign(scale_init)
b = b.assign(-m_init*scale_init)
x = tf.reshape(scale_init,[1,1,1,num_filters])*(x-tf.reshape(m_init,[1,1,1,num_filters]))
else:
W = tf.reshape(g, [1, 1, 1, num_filters]) * tf.nn.l2_normalize(V, [0, 1, 2])
# calculate convolutional layer output
x = tf.nn.bias_add(tf.nn.conv2d(x, W, [1] + stride + [1], pad), b)
# apply nonlinearity
if nonlinearity is not None:
x = nonlinearity(x)
return x
I've investigated the most recent implementation of nn (dense and conv2d layers). With this implementation on a tiny example the mean is just 0.001, and the variance is 0.95. With the code above I got -10^-7 and and 1.0005. Is that I am missing something here, the code in the nn library does not do the same as the code above?
Here is the demo code I used for test:
import tensorflow as tf
import numpy as np
sess = tf.Session()
padding='SAME'
init=True
num_filters=96
filter_size=[3,3]
stride=[1,1]
init_scale=1.
pad='SAME'
x = tf.get_variable('x',shape=[100,32,32,3],dtype=tf.float32,
initializer=tf.random_normal_initializer(0,1.0), trainable=True)
V = tf.get_variable('V', shape=filter_size+[int(x.get_shape()[-1]),num_filters], dtype=tf.float32,
initializer=tf.random_normal_initializer(0, 0.05), trainable=True)
g = tf.get_variable('g', shape=[num_filters], dtype=tf.float32,
initializer=tf.constant_initializer(1.), trainable=True)
b = tf.get_variable('b', shape=[num_filters], dtype=tf.float32,
initializer=tf.constant_initializer(0.), trainable=True)
# use weight normalization (Salimans & Kingma, 2016)
W = tf.reshape(g, [1, 1, 1, num_filters]) * tf.nn.l2_normalize(V, [0, 1, 2])
# calculate convolutional layer output
x = tf.nn.bias_add(tf.nn.conv2d(x, W, [1] + stride + [1], pad), b)
if init: # normalize x
m_init, v_init = tf.nn.moments(x, [0,1,2])
scale_init = init_scale / tf.sqrt(v_init + 1e-10)
with tf.control_dependencies([g.assign(g * scale_init), b.assign_add(-m_init * scale_init)]):
x = tf.identity(x)
init = tf.global_variables_initializer()
sess.run(init)
# mean and var should be zero and unit after initialization
a = sess.run(x)
print np.mean(a)
print np.var(a)
sess.close()
Also I don't understand why in the code it is assign_add instead of assign. I do think the steps before initialization are happening before the assign, so the moments computed in the init step not from t=V*x/||V|| but from the output of the layer. I assume that the whole initialization step is scaled by g, and the bias.
The text was updated successfully, but these errors were encountered:
zoli333
changed the title
7000 loss training cifar10
7000 loss value when training cifar10
Feb 20, 2018
Hello,
I am trying to train cifar10 with the nn conv2d, and dense layers, and with initialization I got 7229.39 loss at the first step (and after a while this is just 5000). I am training with the same model architecture proposed in the weight norm article (Tim Salimans, Kingma). However with older nn dense and conv2d implementation this does not happen (also when I am not using the initialization). This is the implementation I use and got proper loss (around 2.3 at the first steps):
I've investigated the most recent implementation of nn (dense and conv2d layers). With this implementation on a tiny example the mean is just 0.001, and the variance is 0.95. With the code above I got -10^-7 and and 1.0005. Is that I am missing something here, the code in the nn library does not do the same as the code above?
Here is the demo code I used for test:
Also I don't understand why in the code it is assign_add instead of assign. I do think the steps before initialization are happening before the assign, so the moments computed in the init step not from t=V*x/||V|| but from the output of the layer. I assume that the whole initialization step is scaled by g, and the bias.
The text was updated successfully, but these errors were encountered: