Release v0.0.21: Expand caching support for inference, GQA training support, TGI improved performance · huggingface/optimum-neuron

What's Changed

Add GQA optimization for Tensor Parallel training to support the case tp_size > num_key_value_heads by @michaelbenayoun in #498
Mixed-precision training with both torch_xla or torch.autocast by @michaelbenayoun in #523

Add caching support for traced TorchScript models (eg. encoders, stable diffusion models) by @JingyaHuang in #510
Support phi model on feature-extraction, text-classification, token-classification tasks by @JingyaHuang in #509

AWS Neuron SDK 2.18 doesn't support the compilation of SDXL's unet with weights / neff separation, inline_weights_to_neff=True is forced through:

Disable weights / neff separation of SDXL's UNET for neuron sdk 2.18 by @JingyaHuang in #554

Fix/ami authorized keys by @shub-kris in #517
Skip weight load during parallel compile by @michaelbenayoun in #524
fixing format in getting-started.ipynb by @jimburtoft in #526
Removing colab links in notebooks.mdx by @jimburtoft in #525
ADD stale bot by @philschmid in #530
Bump optimum version by @JingyaHuang in #534
Fix style by @JingyaHuang in #538
Fix GQA permutation computation and sequential weight initialization / loading when doing TP by @michaelbenayoun in #531
Add setup runtime step for K8S by @glegendre01 in #541
Disable logging during precompilation by @michaelbenayoun in #539
Do not use deprecated list_files_info by @Wauplin in #536
Adding link to existing Fine-tuning example in Notebooks by @jimburtoft in #527
Add missing notebooks to doc by @JingyaHuang in #543
fix: bug in get_available_cores within container by @oOraph in #546
Init on the xla device by @michaelbenayoun in #521
Adding CodeLlama-7B inference and compilation example notebook by @jimburtoft in #549
Add tools for auto filling traced models cache by @JingyaHuang in #537
Remove print that should not be there by @michaelbenayoun in #552
Use AWS Neuron sdk 2.18 by @dacorvo in #547
Cache utils related cleanup by @michaelbenayoun in #553

Full Changelog: v0.0.20...v0.0.21