Skip to content

Release SuperBench v0.8.0

Compare
Choose a tag to compare
@abuccts abuccts released this 14 Apr 06:39
· 82 commits to main since this release
694ae2a

SuperBench 0.8.0 Release Notes

SuperBench Improvements

  • Support SuperBench Executor running on Windows.
  • Remove fixed rccl version in rocm5.1.x docker file.
  • Upgrade networkx version to fix installation compatibility issue.
  • Pin setuptools version to v65.7.0.
  • Limit ansible_runner version for Python 3.6.
  • Support cgroup V2 when read system metrics in monitor.
  • Fix analyzer bug in Python 3.8 due to pandas api change.
  • Collect real-time GPU power in monitor.
  • Remove unreachable condition when write host list in mpi mode.
  • Upgrade Docker image with cuda12.1, nccl 2.17.1-1, hpcx v2.14, and mlc 3.10.
  • Fix wrong unit of cpu-memory-bw-latency in document.

Micro-benchmark Improvements

  • Add STREAM benchmark for sustainable memory bandwidth and the corresponding computation rate.
  • Add HPL Benchmark for HPC Linpack Benchmark.
  • Support flexible warmup and non-random data initialization in cublas-benchmark.
  • Support error tolerance in micro-benchmark for CuDNN function.
  • Add distributed inference benchmark.
  • Support tensor core precisions (e.g., FP8) and batch/shape range in cublaslt gemm.

Model Benchmark Improvements

  • Fix torch.dist init issue with multiple models.
  • Support TE FP8 in BERT/GPT2 model.
  • Add num_workers configurations in model benchmark.