v0.6.2.post1+rocm
Pre-release
Pre-release
github-actions
released this
23 Oct 00:14
·
1166 commits
to main
since this release
What's Changed
- Make rpdtracer import only when required by @Rohan138 in #216
- Improve profiling setup and documentation, sync benchmarks with main by @AdrianAbeyta in #218
- Installing the requirements before invoking setup.py since it now imports setuptools_scm by @gshtras in #221
- llama3.2 + cross attn test by @maleksan85 in #220
- Optimize CAR for ROCm by @iotamudelta in #225
- Custom PA perf improvements by @sanyalington in #222
- Upstream merge 24 10 08 by @gshtras in #226
- customPA write fp8 small ctx fix; enable customPA write fp8 by default by @sanyalington in #227
- added timeout for vllm build in rocm by @maleksan85 in #230
- Add fp8 for dbrx by @charlifu in #231
- Update Buildkite env variable by @dhonnappa-amd in #232
- cuda graph + num-scheduler-steps bug fix by @seungrokj in #236
- [Model] [BUG] Fix code path logic to load mllama model by @tjtanaa in #234
- prefix-enabled FA perf issue by @seungrokj in #239
- Custom PA Partition size 256 to improve performance by @sanyalington in #238
- [Build/CI] Minor changes to fix internal CI process. by @Alexei-V-Ivanov-AMD in #235
- [BUGFIX] Restored handling of ROCM FA output as before adaptation of llama3.2 by @maleksan85 in #241
New Contributors
- @Rohan138 made their first contribution in #216
- @AdrianAbeyta made their first contribution in #218
- @dhonnappa-amd made their first contribution in #232
- @seungrokj made their first contribution in #236
- @tjtanaa made their first contribution in #234
Full Changelog: v0.6.2+rocm...v0.6.2.post1+rocm