TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers

TenSet is a large-scale multi-platform tensor program performance dataset. TenSet contains 52 million program performance records collected from 6 hardware platforms. This repo is based on a fork of TVM.

Dataset Information

Statics

Item Number

Networks 120

Hardware Platforms 6

Tasks 13,848

Measurement records 51,577,248

Hardware Platforms

Hardware Platform	Cloud Instance	Other Comments
Intel Platinum 8272CL @ 2.60GHz (16 cores)	Azure D32s_v4	AVX-512
Intel E5-2673 v4 @ 2.30GHz (8 cores)	Azure F16s	AVX-2
AMD EPYC 7452 @ 2.35GHz (4 cores)	Azure D16as_v4	AVX-2
ARM Graviton2 (16 cores)	AWS c6g.4xlarge	Neon
NVIDIA Tesla K80	AWS p2.xlarge	Kepler Architecture
NVIDIA Tesla T4	AWS g4dn.xlarge	Turing Architecture

Tutorials

Organization

Follow the above tutorial to download the dataset. The dataset is stored under tenset/scripts/dataset folder.

dataset/network_info: The metadata for networks
- *.relay.pkl: The relay IR of a network. One network per file.
  - For example, (resnet_50,[(1,3,224,224)]).relay.pkl contains the relay IR of resnet_50 with input shape (1, 3, 224, 224).
- *.task.pkl: The tasks and their weights in a network. One (network, targte) pair per file.
  - For example, ((resnet_50,[(1,3,224,224)]),llvm).task.pkl contains all tasks of resnet_50 on llvm backend.
- all_tasks.pkl: A file containing all tasks. It is used an an index for all tasks.
dataset/to_measure_programs: The generated random programs for measurement.
- *.json: The randomly generated programs (schedules) for measurement. One file per task.
dataset/measure_records: Collected measurement records.
- e5-2666/*.json: measurement records collected on an Intel e5-2673. One file per task.
- platinum-8272/*.json: measurement records collected on an Intel platinum-8272. One file per task.
- ...: other hardware platforms

Inspect Tasks and Programs in the Dataset

Follow the above tutorial to download the dataset. You can then inspect the tasks and programs in the dataset

Print a task

cd scripts
python3 print_all_tasks.py --idx 1264

output:

Index: 1264
flop_ct: 115806208.0
workload_key: ["12b88bedece6984af589a28b43e0f3c4", 1, 56, 56, 64, 3, 3, 64, 128, 1, 1, 1, 128, 1, 28, 28, 128]
Compute DAG:
placeholder = PLACEHOLDER [1, 56, 56, 64]
PaddedInput(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 57)) && (i2 >= 1)) && (i2 < 57)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
placeholder = PLACEHOLDER [3, 3, 64, 128]
Conv2dOutput(nn, yy, xx, ff) += (PaddedInput[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*placeholder[ry, rx, rc, ff])
placeholder = PLACEHOLDER [1, 1, 1, 128]
T_add(ax0, ax1, ax2, ax3) = (Conv2dOutput[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)

Print a program

cd scripts
python3 print_programs.py --filename 'dataset/measure_records/e5-2673/([12b88bedece6984af589a28b43e0f3c4,1,56,56,64,3,3,64,128,1,1,1,128,1,28,28,128],llvm).json' --idx 31

output:

Index: 31
Time cost (second): [0.000990787, 0.000826989, 0.00082599, 0.00083999, 0.000827089, 0.000831189, 0.00083599, 0.000853589]
Program:
Placeholder: placeholder, placeholder, placeholder
parallel ax0.0@ax1.0@ax2.0@ (0,4)
  for i1 (0,57)
    for i2 ((floormod(ax0.outer.outer.ax1.outer.outer.fused.ax2.outer.outer.fused, 4)*14),15)
      for i3 (0,64)
        PaddedInput = ...
  for ax3.0 (0,2)
    for ax2.1 (0,7)
      for ax3.1 (0,8)
        Conv2dOutput auto_unroll: 16
        for rx.0 (0,3)
          for rc.0 (0,4)
            for ry.1 (0,3)
              for rc.1 (0,16)
                for yy.3 (0,28)
                  vectorize ff.3 (0,8)
                    Conv2dOutput = ...
        for ax1.2 (0,28)
          vectorize ax3.2 (0,8)
            T_relu = ...

Citation

@inproceedings{zheng2021tenset,
  title={Tenset: A large-scale program performance dataset for learned tensor compilers},
  author={Zheng, Lianmin and Liu, Ruochen and Shao, Junru and Chen, Tianqi and Gonzalez, Joseph E and Stoica, Ion and Ali, Ameer Haj},
  booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
  year={2021}
}

License

The code is licensed under an Apache-2.0 license.
The dataset is licensed under a CC BY 4.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 5,845 Commits
.github		.github
3rdparty		3rdparty
apps		apps
cmake		cmake
conda		conda
docker		docker
docs		docs
golang		golang
include/tvm		include/tvm
jvm		jvm
licenses		licenses
nnvm		nnvm
python		python
rust		rust
scripts		scripts
src		src
tests		tests
tutorials		tutorials
vta		vta
web		web
.asf.yaml		.asf.yaml
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CONTRIBUTORS.md		CONTRIBUTORS.md
Jenkinsfile		Jenkinsfile
KEYS		KEYS
LICENSE		LICENSE
Makefile		Makefile
NEWS.md		NEWS.md
NOTICE		NOTICE
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers

Dataset Information

Tutorials

Organization

Inspect Tasks and Programs in the Dataset

Citation

License

About

Releases

Packages

Contributors 496

Languages

Item	Number
Networks	120
Hardware Platforms	6
Tasks	13,848
Measurement records	51,577,248

License

buaa-hipo/familyseer_pretrain_AE

Folders and files

Latest commit

History

Repository files navigation

TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers

Dataset Information

Tutorials

Organization

Inspect Tasks and Programs in the Dataset

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 496

Languages

Packages