High training time for a model with QLSTM layer #896

shashwat1198 · 2024-03-06T12:21:24Z

shashwat1198
Mar 6, 2024

Hello,

I am trying to train a model in brevitas which comprises of an input QLSTM layer followed by Dense layers. The model has ~6K trainable parameters. I am facing issues with the training time of the model. In my case I am seeing training times in the range of ~5 hours per epoch. This seems to be quite high given the model configuration I use. As soon as I remove the QLSTM layer, the training times come back to some minutes per epoch. Is this issue common while training QLSTM layers? Is there a way to train models with this layer faster?

Model Definition :

class LSTMModel(nn.Module):
    def __init__(self):
        super(LSTMModel, self).__init__()
        self.qlstm = QuantLSTM(input_size = 10, hidden_size = 20, weight_bit_width=int(a1), bias_quant=None,io_quant=Uint8ActPerTensorFloat)
        self.qfc1 = QuantLinear(20,64,bias=True, weight_bit_width=int(a))
        self.qfc2 = QuantLinear(64,32,bias=True, weight_bit_width=int(a))
        self.qfc3 = QuantLinear(32,1, bias=True,weight_bit_width=int(a))
        self.qrelu = QuantReLU(bit_width=int(a))
        self.qsigmoid = nn.Sigmoid()
        self.bn1 = nn.BatchNorm1d(20)
        self.bn2 = nn.BatchNorm1d(64)
        self.bn3 = nn.BatchNorm1d(32)
        self.qsoftmax = nn.Softmax(dim=1)
    
    def forward(self, x):
        out, _ = self.qlstm(x)
        out = self.bn1(out[:, -1, :]) 
        out = self.qrelu(out)
        out = self.qfc1(out) 
        out = self.bn2(out)
        out = self.qrelu(out)
        out = self.qfc2(out) 
        out = self.bn3(out)
        out = self.qrelu(out)
        out = self.qfc3(out)
        out = self.qsigmoid(out)
        return out

Answered by nickfraser

Apr 8, 2024

I haven't had the chance to run your code, but on first glance, I don't see anything obviously wrong with your training script. I don't currently have the bandwidth to analyse your code any further.

I set this as an environment variable. But it did not change the training time! Is there something else that I might be doing wrong?

This is a red flag to me that your network is not being compiled with PyTorch's JIT, with that is mind, I'd say your have 3 options:

Debug your environment to confirm if TorchScript code is being compiled properly (i.e., with PYTORCH_JIT=1, without Brevitas)
Use a "retraining" flow, instead of training from "scratch" (first train a floating point model, and re…

View full answer

Giuseppe5 · 2024-03-07T18:14:39Z

Giuseppe5
Mar 7, 2024
Maintainer

You can try running training with the env flag BREVITAS_JIT=1

6 replies

nickfraser Mar 12, 2024
Maintainer

But it did not change the training time!

This is a red flag for me that something in the environment is not right. With QuantLSTM, there should be a huge performance difference between BREVITAS_JIT=0, BREVITAS_JIT=1 (unless the bottleneck exists elsewhere). Can you tell me anything more about your environment? Can you confirm it has the following:

A c++ compiler
jinja installed

shashwat1198 Mar 13, 2024
Author

Hey,

Sorry for the late response. Just wanted to make sure I have things correct on my machine.
I have a g++ compiler installed : g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 and a Jinja2 installation in my system.
It still takes a lot of time to train. I have attached example training files that I am using. So for some reason, this model is getting trained faster on my CPU than the Quadro P1000 GPU that I have on my machine. I also have the environment variable BREVITAS_JIT=1.
training_files.zip

shashwat1198 Apr 3, 2024
Author

Hey,

Do you guys have any other suggestions of how I can use my to train QLSTM models quickly? If possible would you checkout the zip folder attached in the above reply to see if there is anything wrong with my training loop?

nickfraser Apr 8, 2024
Maintainer

I haven't had the chance to run your code, but on first glance, I don't see anything obviously wrong with your training script. I don't currently have the bandwidth to analyse your code any further.

I set this as an environment variable. But it did not change the training time! Is there something else that I might be doing wrong?

This is a red flag to me that your network is not being compiled with PyTorch's JIT, with that is mind, I'd say your have 3 options:

Debug your environment to confirm if TorchScript code is being compiled properly (i.e., with PYTORCH_JIT=1, without Brevitas)
Use a "retraining" flow, instead of training from "scratch" (first train a floating point model, and retrain your quantized model from this checkpoint)
wait

Answer selected by shashwat1198

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High training time for a model with QLSTM layer #896

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

High training time for a model with QLSTM layer #896

shashwat1198 Mar 6, 2024

Replies: 1 comment · 6 replies

Giuseppe5 Mar 7, 2024 Maintainer

nickfraser Mar 12, 2024 Maintainer

shashwat1198 Mar 13, 2024 Author

shashwat1198 Apr 3, 2024 Author

nickfraser Apr 8, 2024 Maintainer

shashwat1198
Mar 6, 2024

Replies: 1 comment 6 replies

Giuseppe5
Mar 7, 2024
Maintainer

nickfraser Mar 12, 2024
Maintainer

shashwat1198 Mar 13, 2024
Author

shashwat1198 Apr 3, 2024
Author

nickfraser Apr 8, 2024
Maintainer