You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support T5 models, including the variants T5v1.1 and mT5
Support loading the model files from memory:
Python: see the files argument in the constructor of classes loading models
C++: see the models::ModelMemoryReader class
Fixes and improvements
Improve the quantization accuracy of OPT models by applying the SmoothQuant technique during conversion (pre-computed activation scales should be passed to the converter option --activation_scales)
Fix conversion of BART-like models from HuggingFace that are using a different number of encoder and decoder layers
Fix compilation when no BLAS CPU backend is selected
Remove no longer relevant CMake warning when the project is compiled without oneDNN