what is the right way to get a 8-bit model? #28

Sarah-Callies · 2021-04-06T09:52:49Z

1、which version or branch of marian to complie? marian-master， marian-dev or https://github.com/afaji/Marian/tree/fixed-quant?
2、could the teacher model trained by teh marian-master?

kpu · 2021-04-06T10:01:47Z

fixed-quant is dead since it's been merged into master and @afaji needs to pay attention to issue #24 to remove references to it.

You can get a slower 8-bit model in https://github.com/marian-nmt/marian-dev without output matrix quantization or get a faster 8-bit model from https://github.com/browsermt/marian-dev .

The teacher can be trained with anything.

Sarah-Callies · 2021-04-06T10:06:53Z

you mean i can get a fixed-quant(16-bit or 8-bit) model by adding the command "--quantize-bits 16" using marian master

kpu · 2021-04-06T10:31:46Z

@XapaJIaMnu the 8-bit documentation is lacking.

XapaJIaMnu · 2021-04-06T20:08:54Z

@yandaowei sorry for the lacking documentation, could you please check the steps described here: https://github.com/browsermt/students/tree/master/train-student#5-optional-8bit-quantization and see if everything is clear.

The quantisation finetuning is completely optional and is described here https://github.com/browsermt/students/tree/master/train-student/finetune
Basically what it does it takes a fp32 model and damages it in such a way that would correspond to a quantised model. It achieves this by limiting the GEMM outputs to only 255 distinct numbers (in the 8bit case) or 65535 distinct numbers (in the 32bit case). When training for a bit with this scheme the model learns to work with reduced set of distinct values and performs better when quantised.

Sarah-Callies · 2021-04-07T02:29:00Z

So I need two step to get the 8-bit model :
Step1: do as the doc "https://github.com/browsermt/students/tree/master/train-student#5-optional-8bit-quantization" described to get the 8-bit model
Step2: finetune the 8-bit model as "https://github.com/browsermt/students/tree/master/train-student/finetune" described
is that correct?

kpu · 2021-04-07T08:53:23Z

Train an FP32 model as usual.
Optional: finetune the FP32 model with 8-bit damage. This step is mostly only useful if the model is particularly small (on the order of tiny11 in our WNGT paper). If it's larger, then rounding to 8-bit works out of the box.
Optional: if using any of the 8-bit quantization methods that pre-compute the scaling factor for activations (these contain Alpha in the command line for the next step), run a sample decode through to tune the scaling factor.
Convert to 8-bit format.

kpu assigned XapaJIaMnu Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what is the right way to get a 8-bit model? #28

what is the right way to get a 8-bit model? #28

Sarah-Callies commented Apr 6, 2021

kpu commented Apr 6, 2021

Sarah-Callies commented Apr 6, 2021 •

edited

Loading

kpu commented Apr 6, 2021

XapaJIaMnu commented Apr 6, 2021 •

edited

Loading

Sarah-Callies commented Apr 7, 2021 •

edited

Loading

kpu commented Apr 7, 2021

what is the right way to get a 8-bit model? #28

what is the right way to get a 8-bit model? #28

Comments

Sarah-Callies commented Apr 6, 2021

kpu commented Apr 6, 2021

Sarah-Callies commented Apr 6, 2021 • edited Loading

kpu commented Apr 6, 2021

XapaJIaMnu commented Apr 6, 2021 • edited Loading

Sarah-Callies commented Apr 7, 2021 • edited Loading

kpu commented Apr 7, 2021

Sarah-Callies commented Apr 6, 2021 •

edited

Loading

XapaJIaMnu commented Apr 6, 2021 •

edited

Loading

Sarah-Callies commented Apr 7, 2021 •

edited

Loading