Tags · microsoft/superbenchmark

v0.11.0

Docs - Upgrade version and release note (#650)

**Description**

Upgrade version and release note.

**Major Revision**
- Upgrade package versions
- Add release note for v0.11.0

Sep 29, 2024
75dac87
zip
tar.gz
Notes

v0.10.0

Release SuperBench v0.10.0

SuperBench 0.10.0 Release Notes
===============================

SuperBench Improvements
-----------------------

- Support monitoring for AMD GPUs.
- Support ROCm 5.7 and ROCm 6.0 dockerfile.
- Add MSCCL support for Nvidia GPU.
- Fix NUMA domains swap issue in NDv4 topology file.
- Add NDv5 topo file.
- Fix NCCL and NCCL-test to 2.18.3 for hang issue in CUDA 12.2.

Micro-benchmark Improvements
----------------------------

- Add HPL random generator to gemm-flops with ROCm.
- Add DirectXGPURenderFPS benchmark to measure the FPS of rendering simple frames.
- Add HWDecoderFPS benchmark to measure the FPS of hardware decoder performance.
- Update Docker image for H100 support.
- Update MLC version into 3.10 for CUDA/ROCm dockerfile.
- Bug fix for GPU Burn test.
- Support INT8 in cublaslt function.
- Add hipBLASLt function benchmark.
- Support cpu-gpu and gpu-cpu in ib-validation.
- Support graph mode in NCCL/RCCL benchmarks for latency metrics.
- Support cpp implementation in distributed inference benchmark.
- Add O2 option for gpu copy ROCm build.
- Support different hipblasLt data types in dist inference.
- Support in-place in NCCL/RCCL benchmark.
- Support data type option in NCCL/RCCL benchmark.
- Improve P2P performance with fine-grained GPU memory in GPU-copy test for AMD GPUs.
- Update hipblaslt GEMM metric unit to tflops.
- Support FP8 for hipblaslt benchmark.

Model Benchmark Improvements
----------------------------

- Change torch.distributed.launch to torchrun.
- Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark.

Result Analysis
---------------

- Support baseline generation from multiple nodes.

Jan 3, 2024
9eb2bdf
zip
tar.gz
Notes

v0.9.0

Docs - Upgrade version and release note (#557)

**Description**
Upgrade version and release note.


**Major Revision**
- Upgrade package versions
- Add release note for v0.9.0

Jul 26, 2023
1537a27
zip
tar.gz
Notes

v0.8.0

Release SuperBench v0.8.0

SuperBench v0.8.0 Release Notes
===============================

SuperBench Improvements
-----------------------

- Support SuperBench Executor running on Windows.
- Remove fixed rccl version in rocm5.1.x docker file.
- Upgrade networkx version to fix installation compatibility issue.
- Pin setuptools version to v65.7.0.
- Limit ansible_runner version for Python 3.6.
- Support cgroup V2 when read system metrics in monitor.
- Fix analyzer bug in Python 3.8 due to pandas api change.
- Collect real-time GPU power in monitor.
- Remove unreachable condition when write host list in mpi mode.
- Upgrade Docker image with cuda12.1, nccl 2.17.1-1, hpcx v2.14, and mlc
  3.10.
- Fix wrong unit of cpu-memory-bw-latency in document.

Micro-benchmark Improvements
----------------------------

- Add STREAM benchmark for sustainable memory bandwidth and the
  corresponding computation rate.
- Add HPL Benchmark for HPC Linpack Benchmark.
- Support flexible warmup and non-random data initialization in
  cublas-benchmark.
- Support error tolerance in micro-benchmark for CuDNN function.
- Add distributed inference benchmark.
- Support tensor core precisions (e.g., FP8) and batch/shape range in
  cublaslt gemm.

Model Benchmark Improvements
----------------------------

- Fix torch.dist init issue with multiple models.
- Support TE FP8 in BERT/GPT2 model.
- Add num_workers configurations in model benchmark.

Apr 14, 2023
694ae2a
zip
tar.gz
Notes

v0.7.0

Release SuperBench v0.7.0

SuperBench v0.7.0 Release Notes
===============================

SuperBench Improvements
-----------------------

- Support non-zero return code when "sb deploy" or "sb run" fails in
  Ansible.
- Support log flushing to the result file during runtime.
- Update version to include revision hash and date.
- Support "pattern" in mpi mode to run tasks in parallel.
- Support topo-aware, all-pair, and K-batch pattern in mpi mode.
- Fix Transformers version to avoid Tensorrt failure.
- Add CUDA11.8 Docker image for NVIDIA arch90 GPUs.
- Support "sb deploy" without pulling image.

Micro-benchmark Improvements
----------------------------

- Support list of custom config string in cudnn-functions and
  cublas-functions.
- Support correctness check in cublas-functions.
- Support GEMM-FLOPS for NVIDIA arch90 GPUs.
- Support cuBLASLt FP16 and FP8 GEMM.
- Add wait time option to resolve mem-bw unstable issue.
- Fix bug for incorrect datatype judgement in cublas-function source
  code.

Model Benchmark Improvements
----------------------------

- Support FP8 in BERT model training.

Distributed Benchmark Improvements
----------------------------------

- Support pair-wise pattern in IB validation benchmark.
- Support topo-aware, pair-wise, and K-batch pattern in nccl-bw
  benchmark.

Jan 20, 2023
d76e4e1
zip
tar.gz
Notes

v0.6.0

Release SuperBench v0.6.0

SuperBench v0.6.0 Release Notes
===============================

SuperBench Improvement
----------------------

- Support running on host directly without Docker.
- Support running `sb` command inside docker image.
- Support ROCm 5.1.1.
- Support ROCm 5.1.3.
- Fix bugs in data diagnosis.
- Fix cmake and build issues.
- Support automatic configuration yaml selection on Azure VM.
- Refine error message when GPU is not detected.
- Add return code for Timeout.
- Update Dockerfile for NCCL/RCCL version, tag name, and verbose output.
- Support node_num=1 in mpi mode.
- Update Python setup for require packages.
- Enhance parameter parsing to allow spaces in value.
- Support NO_COLOR for SuperBench output.

Micro-benchmark Improvements
----------------------------

- Fix issues in ib loopback benchmark.
- Fix stability issue in ib loopback benchmark.

Distributed Benchmark Improvements
----------------------------------

- Enhance pair-wise IB benchmark.
- Bug Fix in IB benchmark.
- Support topology-aware IB benchmark.

Data Diagnosis and Analysis
---------------------------

- Add failure check function in data_diagnosis.py.
- Support JSON and JSONL in Diagnosis.
- Add support to store values of metrics in data diagnosis.
- Support exit code of sb result diagnosis.
- Format int type and unify empty value to N/A in diagnosis output
  files.

Sep 6, 2022
09549b5
zip
tar.gz
Notes

v0.6.0-rc1

Pre-release v0.6.0-rc1

Pre-release v0.6.0-rc1.

Aug 8, 2022
9c29c93
zip
tar.gz
Notes

v0.5.0

Release SuperBench v0.5.0

SuperBench v0.5.0 Release Notes
===============================

Micro-benchmark Improvements
----------------------------

- Support NIC only NCCL bandwidth benchmark on single node in NCCL/RCCL
  bandwidth test.
- Support bi-directional bandwidth benchmark in GPU copy bandwidth test.
- Support data checking in GPU copy bandwidth test.
- Update rccl-tests submodule to fix divide by zero error.
- Add GPU-Burn micro-benchmark.

Model-benchmark Improvements
----------------------------

- Sync results on root rank for e2e model benchmarks in distributed mode.
- Support customized `env` in local and torch.distributed mode.
- Add support for pytorch>=1.9.0.
- Keep BatchNorm as fp32 for pytorch cnn models cast to fp16.
- Remove FP16 samples type converting time.
- Support FAMBench.

Inference Benchmark Improvements
--------------------------------

- Revise the default setting for inference benchmark.
- Add percentile metrics for inference benchmarks.
- Support T4 and A10 in GEMM benchmark.
- Add configuration with inference benchmark.

Other Improvements
------------------

- Add command to support listing all optional parameters for benchmarks.
- Unify benchmark naming convention and support multiple tests with same
  benchmark and different parameters/options in one configuration file.
- Support timeout to detect the benchmark failure and stop the process
  automatically.
- Add rocm5.0 dockerfile.
- Improve output interface.

Data Diagnosis and Analysis
---------------------------

- Support multi-benchmark check.
- Support result summary in md, html and excel formats.
- Support data diagnosis in md and html formats.
- Support result output for all nodes in data diagnosis.

Apr 29, 2022
7f607e4
zip
tar.gz
Notes

v0.5.0-rc1

Pre-release v0.5.0-rc1

Pre-release v0.5.0-rc1.

Mar 25, 2022
84fed1c
zip
tar.gz

v0.4.0

Release SuperBench v0.4.0

SuperBench v0.4.0 Release Notes
===============================

SuperBench Framework
--------------------

__Monitor__

- Add monitor framework for NVIDIA GPU, CPU, memory and disk.

__Data Diagnosis and Analysis__

- Support baseline-based data diagnosis.
- Support basic analysis feature (boxplot figure, outlier detection,
  etc.).

Single-node Validation
----------------------

__Micro Benchmarks__

- CPU Memory Validation (tool: Intel Memory Latency Checker).
- GPU Copy Bandwidth (tool: built by MSRA).
- Add ORT Model on AMD GPU platform.
- Add inference backend TensorRT.
- Add inference backend ORT.

Multi-node Validation
---------------------

__Micro Benchmarks__

- IB Networking validation.
- TCP validation (tool: TCPing).
- GPCNet Validation (tool: GPCNet).

Other Improvement
-----------------

1. Enhancement
   - Add pipeline for AMD docker.
   - Integrate system config info script with SuperBench.
   - Support FP32 mode without TF32.
   - Refine unit test for microbenchmark.
   - Unify metric names for all benchmarks.

2. Document
   - Add benchmark list.
   - Add monitor document.
   - Add data diagnosis document.

Dec 28, 2021
525cec7
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.11.0

v0.10.0

v0.9.0

v0.8.0

v0.7.0

v0.6.0

v0.6.0-rc1

v0.5.0

v0.5.0-rc1

v0.4.0

Tags: microsoft/superbenchmark