You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After reading the paper, it seems as though the content stream consists of its own word embedding and positional encoding, along with the word embeddings and positional encodings associated with its respective permutation vector, and the query stream consist of its positional encoding and a random W embedding, along with the word embeddings and the positional encodings of its respective permutation vector. My question is, what is the positional encoding? Is it a learnable vector as in the case of BERT, or the sinusoid function used in other transformers? I'd like to understand how this encoding is derived. Thanks!
The text was updated successfully, but these errors were encountered:
After reading the paper, it seems as though the content stream consists of its own word embedding and positional encoding, along with the word embeddings and positional encodings associated with its respective permutation vector, and the query stream consist of its positional encoding and a random W embedding, along with the word embeddings and the positional encodings of its respective permutation vector. My question is, what is the positional encoding? Is it a learnable vector as in the case of BERT, or the sinusoid function used in other transformers? I'd like to understand how this encoding is derived. Thanks!
The text was updated successfully, but these errors were encountered: