Skip to content

0.35.0

Compare
Choose a tag to compare
@matatonic matatonic released this 29 Sep 20:52
· 21 commits to main since this release

Recent updates

Version 0.35.0

  • Update Molmo (tensorflow-cpu no longer required), and add autocast for faster, smaller types than float32.
  • New option: --use-double-quant to enable double quantization with --load-in-4bit, a little slower for a little less VRAM.
  • Molmo 72B will now run in under 48GB of vram using --load-in-4bit --use-double-quant.
  • Add completion_tokens counts in API and logged tokens/s for most results, other compatibility improvements
  • Include sample tokens/s data (A100) in vision.sample.env