Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negotiate testing requirements #37

Closed
XapaJIaMnu opened this issue May 17, 2021 · 6 comments · Fixed by browsermt/bergamot-translator-tests#28
Closed

Negotiate testing requirements #37

XapaJIaMnu opened this issue May 17, 2021 · 6 comments · Fixed by browsermt/bergamot-translator-tests#28

Comments

@XapaJIaMnu
Copy link
Contributor

@jerinphilip let's see what you need for the test setup before we redeploy the latest-and-greatest models and break your tests.

Could you give me a list of files that you use and if you want anything changed.

@jerinphilip
Copy link

https://github.com/browsermt/bergamot-translator-tests/blob/main/models/download-models.sh uses only the de-en model which is downloaded as a tar.gz and unpacked. Aren't these archives (not files as in the download scripts now) expected to be the mode of delivery now for both translateLocally and bergamot-translator? The download scripts in this repo show individual files (https://github.com/browsermt/students/blob/master/csen/download-models.sh).

If we're maintaining a model zoo (separate from Mozilla), the models/shortlists/vocabs/prefix-files uploaded should also be tested to work with the source in bergamot-translator. I'm guessing I will be reusing the preferred setup here which is going with binary files now? bergamot-translator will have only one language, while it might be nice to have on Jenkins@lofn CI a pull from here and test all models to be compatible with bergamot-translator with tiny inputs.

The extra requirement for the testing purposes is where we continue testing the non-binary codepaths in bergamot-translator, so one archive (de-en) perhaps of the non-converted models/shortlists/vocabs/prefix-files.

Linking a few relevant issues (#35, browsermt/bergamot-translator#104).

@XapaJIaMnu
Copy link
Contributor Author

I think that one language is sufficient for testing purpose. We should also be keeping compatibility with both binary and non-binary shortlists.

In terms of what the end user needs, it's either the binary or the nonbinary shortlist. Considering that the binary shortlist is much much smaller, we should remove it from the archive:

drwxr-xr-x 2 dheart dheart 4.0K May 13 10:42 ./
drwxr-xr-x 3 dheart dheart 4.0K May 13 10:42 ../
-rw-r--r-- 1 dheart dheart  262 May 11 12:59 catalog-entry.yml
-rw-r--r-- 1 dheart dheart  331 May 11 12:59 config.intgemm8bitalpha.yml
-rw-r--r-- 1 dheart dheart 3.2M Apr  8 09:25 lex.s2t.bin
-rw-r--r-- 1 dheart dheart  11M Feb 18  2020 lex.s2t.gz
-rw-r--r-- 1 dheart dheart  186 May 11 12:59 model_info.json
-rw-r--r-- 1 dheart dheart  17M Aug  1  2020 model.intgemm.alphas.bin
-rw-r--r-- 1 dheart dheart  521 May 11 12:59 server.intgemm8bitalpha.yml
-rwxr-xr-x 1 dheart dheart 1.5K May 11 12:59 speed.cpu.intgemm8bitalpha.sh*
-rw-r--r-- 1 dheart dheart 807K Feb 12  2020 vocab.esen.spm

This is what the archive was distributing as of Friday. Let's get rid of all the .bin shortlists except for one archive which would be used for testing? I can then update download-models.sh appropriately?

@kpu
Copy link
Member

kpu commented May 17, 2021

You mean get rid of all the .gz shortlists right?

@XapaJIaMnu
Copy link
Contributor Author

No, I mean the bergamot-translator-tests one.

@jerinphilip
Copy link

Let's get rid of all the .bin shortlists except for one archive which would be used for testing? I can then update download-models.sh appropriately?

Sounds good.

@XapaJIaMnu
Copy link
Contributor Author

should be fixed, waiting for confirmation before i redeploy and close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants