From 8f89b71c3fb0d87f5ad17aceb810f077a8967db9 Mon Sep 17 00:00:00 2001 From: Grigori Fursin Date: Sun, 5 Nov 2023 16:02:37 +0100 Subject: [PATCH] continue updating SCC'23 tutorial (added Google colab page) --- README.md | 2 ++ docs/_generator/generate_toc.cmd | 1 + docs/news.md | 5 +++ docs/taskforce.md | 2 +- .../tutorials/modular-image-classification.md | 3 +- docs/tutorials/scc23-mlperf-inference-bert.md | 36 +++++++++++++------ 6 files changed, 37 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 18168d7d57..f3f03ab1e9 100755 --- a/README.md +++ b/README.md @@ -7,6 +7,8 @@ ### News +* The ACM YouTube channel has released the ACM REP'23 keynote about the MLCommons CM automation language and CK playground: + [toward a common language to facilitate reproducible research and technology transfer](https://youtu.be/_1f9i_Bzjmg?si=7XoXRtcU0rglRJr0). * The MLCommons Task Force on Automation and Reproducibility is resuming [weekly conf-calls](https://docs.google.com/document/d/1zMNK1m_LhWm6jimZK6YE05hu4VH9usdbKJ3nBy-ZPAw/edit) - it is open to everyone! * [The community](https://access.cknowledge.org/playground/?action=contributors) has successfully validated diff --git a/docs/_generator/generate_toc.cmd b/docs/_generator/generate_toc.cmd index ac9053e330..38b35b6a73 100644 --- a/docs/_generator/generate_toc.cmd +++ b/docs/_generator/generate_toc.cmd @@ -1,5 +1,6 @@ cd ../tutorials +cm create-toc-from-md utils --input=scc23-mlperf-inference-bert.md cm create-toc-from-md utils --input=sc22-scc-mlperf.md cm create-toc-from-md utils --input=sc22-scc-mlperf-part2.md cm create-toc-from-md utils --input=sc22-scc-mlperf-part3.md diff --git a/docs/news.md b/docs/news.md index 15e1a1c209..149e7bff85 100644 --- a/docs/news.md +++ b/docs/news.md @@ -2,6 +2,11 @@ ## MLCommons CK and CM news +### 202311 + +* The ACM YouTube channel has released the ACM REP'23 keynote about the MLCommons CM automation language and CK playground: + [toward a common language to facilitate reproducible research and technology transfer](https://youtu.be/_1f9i_Bzjmg?si=7XoXRtcU0rglRJr0). + ### 202310 * The MLCommons Task Force on Automation and Reproducibility is resuming [weekly conf-calls](https://docs.google.com/document/d/1zMNK1m_LhWm6jimZK6YE05hu4VH9usdbKJ3nBy-ZPAw/edit) - diff --git a/docs/taskforce.md b/docs/taskforce.md index 297866e255..c9da8f22be 100644 --- a/docs/taskforce.md +++ b/docs/taskforce.md @@ -9,7 +9,7 @@ to develop an open-source automation technology that can help everyone [co-desig that can run AI and ML applications in the most efficient way across diverse models, data sets, software and hardware from any vendor using the [MLPerf methodology](https://arxiv.org/abs/1911.02549). -As an outcome of this community effort, we have developed the [MLCommons CM automation language](https://doi.org/10.5281/zenodo.8105339), +As an outcome of this community effort, we have developed the [MLCommons CM automation language](https://youtu.be/_1f9i_Bzjmg?si=l0Qqon2Rt7pSji36), [MLCommons C++ Modular Inference Library (MIL)](../cm-mlops/script/app-mlperf-inference-cpp/README-extra.md) and the [MLCommons CK playground](https://access.cKnowledge.org). This open-source technology was successfully validated during the [1st mass-scale community MLPerf inference submission](https://www.linkedin.com/feed/update/urn:li:activity:7112057645603119104/) diff --git a/docs/tutorials/modular-image-classification.md b/docs/tutorials/modular-image-classification.md index bdbb8f5cf3..dcf8437cd9 100644 --- a/docs/tutorials/modular-image-classification.md +++ b/docs/tutorials/modular-image-classification.md @@ -1,6 +1,7 @@ [ [Back to index](../README.md) ] -*This tutorial is also available in [Google Colab](https://colab.research.google.com/drive/1fPFw86BKOQ79U1-lksTkAtJHn3_jhP9o?usp=sharing).* +*An interactive version of this tutorial is also available + at this [Google Colab page](https://colab.research.google.com/drive/1fPFw86BKOQ79U1-lksTkAtJHn3_jhP9o?usp=sharing).* # Trying CM: modular image classification diff --git a/docs/tutorials/scc23-mlperf-inference-bert.md b/docs/tutorials/scc23-mlperf-inference-bert.md index a590a35a0e..2330326cfd 100644 --- a/docs/tutorials/scc23-mlperf-inference-bert.md +++ b/docs/tutorials/scc23-mlperf-inference-bert.md @@ -25,21 +25,22 @@ * [Download the SQuAD validation dataset](#download-the-squad-validation-dataset) * [Detect or install ONNX runtime for CPU](#detect-or-install-onnx-runtime-for-cpu) * [Download Bert-large model (FP32, ONNX format)](#download-bert-large-model-fp32-onnx-format) - * [Pull MLPerf inference sources with reference implementations](#pull-mlperf-inference-sources-with-reference-implementations) + * [Pull MLPerf inference sources with reference benchmark implementations](#pull-mlperf-inference-sources-with-reference-benchmark-implementations) * [Run short reference MLPerf inference benchmark to measure accuracy (offline scenario)](#run-short-reference-mlperf-inference-benchmark-to-measure-accuracy-offline-scenario) * [Run short MLPerf inference benchmark to measure performance (offline scenario)](#run-short-mlperf-inference-benchmark-to-measure-performance-offline-scenario) * [Prepare minimal MLPerf submission to the SCC committee](#prepare-minimal-mlperf-submission-to-the-scc-committee) + * [Publish results at the live SCC'23 dashboard](#publish-results-at-the-live-scc'23-dashboard) * [Run optimized implementation of the MLPerf inference benchmark](#run-optimized-implementation-of-the-mlperf-inference-benchmark) * [Showcasing CPU performance (x64 or Arm64)](#showcasing-cpu-performance-x64-or-arm64) - * [int8](#int8) - * [fp32](#fp32) + * [Quantized and pruned BERT model (int8)](#quantized-and-pruned-bert-model-int8) + * [Pruned BERT model (fp32)](#pruned-bert-model-fp32) * [Showcasing Nvidia GPU performance](#showcasing-nvidia-gpu-performance) * [Showcasing Nvidia AMD performance](#showcasing-nvidia-amd-performance) * [Optimize benchmark yourself](#optimize-benchmark-yourself) - * [Using quantized models](#using-quantized-models) * [Changing batch size](#changing-batch-size) * [Adding support for multi-node execution](#adding-support-for-multi-node-execution) * [Adding new implementation for new hardware](#adding-new-implementation-for-new-hardware) + * [The next steps](#the-next-steps) * [Acknowledgments](#acknowledgments) * [Nvidia MLPerf inference backend](#nvidia-mlperf-inference-backend) * [DeepSparse MLPerf inference backend](#deepsparse-mlperf-inference-backend) @@ -76,7 +77,8 @@ that you will submit to the SCC organizers to get points. - +*An interactive version of the short versionof this tutorial is available + at this [Google colab page](https://colab.research.google.com/drive/1kgw1pdKi8QcCTqPZu1Vh_ur1NOeTRdWJ?usp=sharing)*. @@ -565,9 +567,18 @@ cm show cache --tags=get,ml-model,bert-large,_onnx *Note that you will have a different CM UID consisting of 16 hexadecimal lowercase characters.* +You can find downloaded model as follows: + +```bash +ls `cm find cache "download ml-model bert-large"` -l +... + 1340711828 Nov 5 14:31 model.onnx +... +``` -### Pull MLPerf inference sources with reference implementations + +### Pull MLPerf inference sources with reference benchmark implementations You should now download and cache the MLPerf inference sources using the following command: @@ -1012,13 +1023,13 @@ Don't forget to set this environment if you use Python virtual environment insta export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf" ``` -#### Int8 pruned BERT model +#### Quantized and pruned BERT model (int8) First you can make a full (valid) run of the MLPerf inference benchmark with quantized and pruned Int8 BERT model, batch size of 128 and DeepSparse backend via CM as follows: ``` -cmr "run mlperf inference generate-run-cmds _submission _short" \ +cmr "run mlperf inference generate-run-cmds _submission _short _dashboard" \ --submitter="SCC23" \ --hw_name=default \ --implementation=reference \ @@ -1026,9 +1037,11 @@ cmr "run mlperf inference generate-run-cmds _submission _short" \ --backend=deepsparse \ --device=cpu \ --scenario=Offline \ - --execution-mode=valid \ + --execution-mode=test \ + --test_query_count=2000 \ --adr.mlperf-inference-implementation.max_batchsize=128 \ --env.CM_MLPERF_NEURALMAGIC_MODEL_ZOO_STUB=zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/14layer_pruned50_quant-none-vnni \ + --dashboard_wb_project=cm-mlperf-scc23-bert-offline \ --quiet \ --output_tar=mlperf_submission_1.tar.gz \ --output_summary=mlperf_submission_1_summary \ @@ -1037,7 +1050,10 @@ cmr "run mlperf inference generate-run-cmds _submission _short" \ ``` -#### fp32 pruned BERT model + + + +#### Pruned BERT model (fp32) ```bash cmr "run mlperf inference generate-run-cmds _submission _short" \