What's Changed
- Enable jax profiler server in run with ray by @FanhaiLu1 in #112
- Add for readme interleave multiple host with ray by @FanhaiLu1 in #114
- Fix conversion bug by @yeandy in #116
- Integrate disaggregated serving with JetStream by @FanhaiLu1 in #117
- Support HF LLaMA ckpt conversion by @lsy323 in #118
- Add guide on adding HF ckpt conversion support by @lsy323 in #119
- Add support for Llama3-70b by @bhavya01 in #101
- Fix convert_checkpoint.py for hf and gemma by @qihqi in #121
- Mixtral enablement. by @wang2yn84 in #120
- add script to isntall for GPU by @qihqi in #122
- Add activation quantization support to per-channel quantized linear layers by @lsy323 in #105
- Remove JSON config mangling for Gemma ckpt by @lsy323 in #124
- Add different token sampling algorithms to decoder. by @bvrockwell in #123
- Add lock in prefill and generate to prevent starvation by @FanhaiLu1 in #126
- Update submodules, prepare for leasing v0.2.4 by @qihqi in #127
- Update README.md by @qihqi in #128
- Update summary.md by @qihqi in #125
- Update README.md by @bhavya01 in #129
- make sure GPU works by @qihqi in #130
New Contributors
- @yeandy made their first contribution in #116
- @bvrockwell made their first contribution in #123
Full Changelog: jetstream-v0.2.2...jetstream-v0.2.3