-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BPE codes #1
Comments
Hi, I'm struggling creating the data the way you described. I followed the instructions closely and the data after preprocessing with fairseq looks like this: Some line of
When I preprocess the data afterwards with
The same tree, before-bpe, looks like this:
My triesI thought about reapplying the BPE, so I executed
This produces junk for the tokens where BPE has been applied in the previous step. See for example: QuestionHow should I preprocess the IWSLT data to get the correct BPE'd tree? Thanks! |
Hi,
nice project, thanks! I'm just trying to replicate your setup for IWSLT'14. Did you change the variable
BPE_TOKENS
in fairseqsprepare-iwslt14.sh
to 32k as mentioned in your paper?Are you willing to share your bpe codes with me?
Thanks
The text was updated successfully, but these errors were encountered: