Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are the positional encodings derived #279

Open
bnicholl opened this issue Feb 5, 2021 · 0 comments
Open

How are the positional encodings derived #279

bnicholl opened this issue Feb 5, 2021 · 0 comments

Comments

@bnicholl
Copy link

bnicholl commented Feb 5, 2021

After reading the paper, it seems as though the content stream consists of its own word embedding and positional encoding, along with the word embeddings and positional encodings associated with its respective permutation vector, and the query stream consist of its positional encoding and a random W embedding, along with the word embeddings and the positional encodings of its respective permutation vector. My question is, what is the positional encoding? Is it a learnable vector as in the case of BERT, or the sinusoid function used in other transformers? I'd like to understand how this encoding is derived. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant