xlnet, transformer xl attention score funtion problem #295

wonjunchoi-arc · 2023-11-07T02:15:40Z

A function tf.einsum (‘ibnd,jbnd->ijbn’, (head_q, head_k) used to obtain an attachment score from xlnet, transformer xl, can’t find correlation between all words, can anyone explain it on calculation? For example, if you have a 2,2 tensor called [i,am],[a,boy] i is i,a am is am,boy A is i, a Boy only calculates am,boy correlation. please help me

def call(self, w, r, attn_mask, mems, head_mask, output_attentions, training=False):
        qlen, rlen, bsz = shape_list(w)[0], shape_list(r)[0], shape_list(w)[1]

        if mems is not None:
            cat = tf.concat([mems, w], 0)
            if self.pre_lnorm:
                w_heads = self.qkv_net(self.layer_norm(cat))
            else:
                w_heads = self.qkv_net(cat)
            r_head_k = self.r_net(r)

            w_head_q, w_head_k, w_head_v = tf.split(w_heads, 3, axis=-1)
            w_head_q = w_head_q[-qlen:]
        else:
            if self.pre_lnorm:
                w_heads = self.qkv_net(self.layer_norm(w))
            else:
                w_heads = self.qkv_net(w)
            r_head_k = self.r_net(r)

            w_head_q, w_head_k, w_head_v = tf.split(w_heads, 3, axis=-1)

        klen = shape_list(w_head_k)[0]

        w_head_q = tf.reshape(w_head_q, (qlen, bsz, self.n_head, self.d_head))  # qlen x bsz x n_head x d_head
        w_head_k = tf.reshape(w_head_k, (klen, bsz, self.n_head, self.d_head))  # qlen x bsz x n_head x d_head
        w_head_v = tf.reshape(w_head_v, (klen, bsz, self.n_head, self.d_head))  # qlen x bsz x n_head x d_head

        r_head_k = tf.reshape(r_head_k, (rlen, self.n_head, self.d_head))  # qlen x n_head x d_head

        # compute attention score
        rw_head_q = w_head_q + self.r_w_bias  # qlen x bsz x n_head x d_head
        AC = tf.einsum("ibnd,jbnd->ijbn", rw_head_q, w_head_k)  # qlen x klen x bsz x n_head

        rr_head_q = w_head_q + self.r_r_bias
        BD = tf.einsum("ibnd,jnd->ijbn", rr_head_q, r_head_k)  # qlen x klen x bsz x n_head
        BD = self._rel_shift(BD)

attention_score = np.einsum ("ijkl,ijml->ijkm", Q, K)/np.sqrt(hidden_size) # [batch_size, num_haed, sequence_length, sequence_length] As above, the formula for finding the basic attrition score finds the attrition score between all words and all words tf.einsum ("ibnd,jbnd->ijbn", rw_head_q, w_head_k) doesn't get the attrition socre with all the words as described above, so I want to know about this part

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xlnet, transformer xl attention score funtion problem #295

xlnet, transformer xl attention score funtion problem #295

wonjunchoi-arc commented Nov 7, 2023

xlnet, transformer xl attention score funtion problem #295

xlnet, transformer xl attention score funtion problem #295

Comments

wonjunchoi-arc commented Nov 7, 2023