You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A function tf.einsum (‘ibnd,jbnd->ijbn’, (head_q, head_k) used to obtain an attachment score from xlnet, transformer xl, can’t find correlation between all words, can anyone explain it on calculation? For example, if you have a 2,2 tensor called [i,am],[a,boy] i is i,a am is am,boy A is i, a Boy only calculates am,boy correlation. please help me
def call(self, w, r, attn_mask, mems, head_mask, output_attentions, training=False):
qlen, rlen, bsz = shape_list(w)[0], shape_list(r)[0], shape_list(w)[1]
if mems is not None:
cat = tf.concat([mems, w], 0)
if self.pre_lnorm:
w_heads = self.qkv_net(self.layer_norm(cat))
else:
w_heads = self.qkv_net(cat)
r_head_k = self.r_net(r)
w_head_q, w_head_k, w_head_v = tf.split(w_heads, 3, axis=-1)
w_head_q = w_head_q[-qlen:]
else:
if self.pre_lnorm:
w_heads = self.qkv_net(self.layer_norm(w))
else:
w_heads = self.qkv_net(w)
r_head_k = self.r_net(r)
w_head_q, w_head_k, w_head_v = tf.split(w_heads, 3, axis=-1)
klen = shape_list(w_head_k)[0]
w_head_q = tf.reshape(w_head_q, (qlen, bsz, self.n_head, self.d_head)) # qlen x bsz x n_head x d_head
w_head_k = tf.reshape(w_head_k, (klen, bsz, self.n_head, self.d_head)) # qlen x bsz x n_head x d_head
w_head_v = tf.reshape(w_head_v, (klen, bsz, self.n_head, self.d_head)) # qlen x bsz x n_head x d_head
r_head_k = tf.reshape(r_head_k, (rlen, self.n_head, self.d_head)) # qlen x n_head x d_head
# compute attention score
rw_head_q = w_head_q + self.r_w_bias # qlen x bsz x n_head x d_head
AC = tf.einsum("ibnd,jbnd->ijbn", rw_head_q, w_head_k) # qlen x klen x bsz x n_head
rr_head_q = w_head_q + self.r_r_bias
BD = tf.einsum("ibnd,jnd->ijbn", rr_head_q, r_head_k) # qlen x klen x bsz x n_head
BD = self._rel_shift(BD)
attention_score = np.einsum ("ijkl,ijml->ijkm", Q, K)/np.sqrt(hidden_size) # [batch_size, num_haed, sequence_length, sequence_length] As above, the formula for finding the basic attrition score finds the attrition score between all words and all words tf.einsum ("ibnd,jbnd->ijbn", rw_head_q, w_head_k) doesn't get the attrition socre with all the words as described above, so I want to know about this part
The text was updated successfully, but these errors were encountered:
A function tf.einsum (‘ibnd,jbnd->ijbn’, (head_q, head_k) used to obtain an attachment score from xlnet, transformer xl, can’t find correlation between all words, can anyone explain it on calculation? For example, if you have a 2,2 tensor called [i,am],[a,boy] i is i,a am is am,boy A is i, a Boy only calculates am,boy correlation. please help me
attention_score = np.einsum ("ijkl,ijml->ijkm", Q, K)/np.sqrt(hidden_size) # [batch_size, num_haed, sequence_length, sequence_length] As above, the formula for finding the basic attrition score finds the attrition score between all words and all words tf.einsum ("ibnd,jbnd->ijbn", rw_head_q, w_head_k) doesn't get the attrition socre with all the words as described above, so I want to know about this part
The text was updated successfully, but these errors were encountered: