Skip to content
This repository was archived by the owner on Sep 24, 2024. It is now read-only.

mask token id == 3 during pretraining? #231

Open
whaleloops opened this issue Nov 8, 2022 · 0 comments
Open

mask token id == 3 during pretraining? #231

whaleloops opened this issue Nov 8, 2022 · 0 comments

Comments

@whaleloops
Copy link

I noticed that kMaskWordTokenId (mask2 as defined in the paper) is 3 as defined below.
https://github.com/google-research/pegasus/blob/main/pegasus/ops/pretrain_parsing_ops.cc#L69

However, the id of token 'a' is also 3 in sentencepiece vocab from "gs://t5-data/vocabs/cc_all.32000/sentencepiece.model"

@EKebriaei

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant