Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include non-breaking prefixes file for source language #35

Open
kpu opened this issue May 9, 2021 · 4 comments
Open

Include non-breaking prefixes file for source language #35

kpu opened this issue May 9, 2021 · 4 comments

Comments

@kpu
Copy link
Member

kpu commented May 9, 2021

Currently bergamot-translator is just not loading non-breaking prefixes browsermt/bergamot-translator#104 . This is bad and should be fixed. I think the clean way to do this is to ship the file for the source language. They're small enough that some copying is probably ok.

@jerinphilip
Copy link

Can you bring the relevant nonbreaking_prefixes.xx into the archive, @XapaJIaMnu. I'll pick this up at BRT to include tests for browsermt/bergamot-translator#172.

@XapaJIaMnu
Copy link
Contributor

Where exactly do we get those from? Is that part off ssplit, @ugermann ?

@kpu
Copy link
Member Author

kpu commented May 27, 2021

@ugermann
Copy link
Member

They actually ship with the sentence splitter and may diverge from Moses over time, as we add additional prefixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants