You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use the BitNet modeling in an other project to use bitblas kernels, when I load the model, and try to replace linear layers, with BitBlas Linear layers, the _get_or_create_bitblas_operator function takes a lot of time to execute and compile kernels based on the weight shape, for a model with 32 layers, with a hidden size of 4096 and intermediate size of 14336 it takes ~8 min. Is this an intended behaviour ? Thank you for your help
The text was updated successfully, but these errors were encountered:
Hi @MekkCyber , Yeah, when bitblas encounters a kernel configuration for the first time, it performs the compilation and stores the result in a database, which is located by default at ~/.cache/bitblas. The next time it encounters the same configuration, it retrieves the precompiled library directly from the database, bypassing the tuning process.
As a result, tuning only occurs the first time a specific model and its initial layer are encountered :)
We’re also considering bypassing tuning by saving compilation results for different hardware setups, but this is challenging and may take some time to design and implement though :)
Hello @LeiWang1999
I am trying to use the BitNet modeling in an other project to use bitblas kernels, when I load the model, and try to replace linear layers, with BitBlas Linear layers, the
_get_or_create_bitblas_operator
function takes a lot of time to execute and compile kernels based on the weight shape, for a model with 32 layers, with a hidden size of 4096 and intermediate size of 14336 it takes ~8 min. Is this an intended behaviour ? Thank you for your helpThe text was updated successfully, but these errors were encountered: