Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is the right way to get a 8-bit model? #28

Open
Sarah-Callies opened this issue Apr 6, 2021 · 6 comments
Open

what is the right way to get a 8-bit model? #28

Sarah-Callies opened this issue Apr 6, 2021 · 6 comments
Assignees

Comments

@Sarah-Callies
Copy link

1、which version or branch of marian to complie? marian-master, marian-dev or https://github.com/afaji/Marian/tree/fixed-quant?
2、could the teacher model trained by teh marian-master?

@kpu
Copy link
Member

kpu commented Apr 6, 2021

fixed-quant is dead since it's been merged into master and @afaji needs to pay attention to issue #24 to remove references to it.

You can get a slower 8-bit model in https://github.com/marian-nmt/marian-dev without output matrix quantization or get a faster 8-bit model from https://github.com/browsermt/marian-dev .

The teacher can be trained with anything.

@Sarah-Callies
Copy link
Author

Sarah-Callies commented Apr 6, 2021

you mean i can get a fixed-quant(16-bit or 8-bit) model by adding the command "--quantize-bits 16" using marian master

@kpu
Copy link
Member

kpu commented Apr 6, 2021

@XapaJIaMnu the 8-bit documentation is lacking.

@XapaJIaMnu
Copy link
Contributor

XapaJIaMnu commented Apr 6, 2021

@yandaowei sorry for the lacking documentation, could you please check the steps described here: https://github.com/browsermt/students/tree/master/train-student#5-optional-8bit-quantization and see if everything is clear.

The quantisation finetuning is completely optional and is described here https://github.com/browsermt/students/tree/master/train-student/finetune
Basically what it does it takes a fp32 model and damages it in such a way that would correspond to a quantised model. It achieves this by limiting the GEMM outputs to only 255 distinct numbers (in the 8bit case) or 65535 distinct numbers (in the 32bit case). When training for a bit with this scheme the model learns to work with reduced set of distinct values and performs better when quantised.

@Sarah-Callies
Copy link
Author

Sarah-Callies commented Apr 7, 2021

So I need two step to get the 8-bit model :
Step1: do as the doc "https://github.com/browsermt/students/tree/master/train-student#5-optional-8bit-quantization" described to get the 8-bit model
Step2: finetune the 8-bit model as "https://github.com/browsermt/students/tree/master/train-student/finetune" described
is that correct?

@kpu
Copy link
Member

kpu commented Apr 7, 2021

  1. Train an FP32 model as usual.
  2. Optional: finetune the FP32 model with 8-bit damage. This step is mostly only useful if the model is particularly small (on the order of tiny11 in our WNGT paper). If it's larger, then rounding to 8-bit works out of the box.
  3. Optional: if using any of the 8-bit quantization methods that pre-compute the scaling factor for activations (these contain Alpha in the command line for the next step), run a sample decode through to tune the scaling factor.
  4. Convert to 8-bit format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants