What's Changed
- Add docker image with CPU only dependencies by @johnugeorge in #8
- Add dlio fixes by @johnugeorge in #10
- Fixed issues related to checkpointing and profiling by @zhenghh04 in #13
- Config parameters fixes by @johnugeorge in #11
- Fixing folder number for evaluation by @johnugeorge in #14
- fixed checkpoint issues by @zhenghh04 in #16
- Adding PR unit tests for testing different data format and fixing issues for reading png and jpeg with pytorch data folder. by @zhenghh04 in #17
- A bunch of minor fixes by @zhenghh04 in #18
- Minor fixes by @zhenghh04 in #22
- Add ckpting to UNET3D workload, remove old prefetch param by @lhovon in #23
- Minor modification of configuration options to remove some confusion by @zhenghh04 in #25
- Adding Storage interface for supporting multiple storage backends by @johnugeorge in #20
- Code Fixes by @johnugeorge in #26
- Add the UNET3D sleep time for V100 32GB batch size 4 by @lhovon in #29
- Minor config changes by @johnugeorge in #31
- Make hydra config folder configurable by @johnugeorge in #32
- Mlperf storage v0.5 by @zhenghh04 in #33
- Changes to support segregation of data loader and reader by @hariharan-devarajan in #37
- Added application-level profile support for DLIO by @hariharan-devarajan in #39
- Multithreading issue with TensorFlow and PyTorch dataloader by @hariharan-devarajan in #44
- bug fix to free memory once file is completely read by @hariharan-devarajan in #51
- Pull changes from mlperf_storage_v0.5.1 by @zhenghh04 in #52
- Improved tracing utility added preprocessing support by @zhenghh04 in #53
- Trace improvement. by @hariharan-devarajan in #48
- Moved resize image to config by @zhenghh04 in #55
- instead of using direct methods using enter and exit. by @hariharan-devarajan in #54
- Reorganizing output files by @zhenghh04 in #56
- Generator fixed random seed by @zhenghh04 in #58
- Merging branch mlperf_storage_v0.5.1 by @zhenghh04 in #57
- fixing mistakes in calculating total number of steps by @zhenghh04 in #59
- Mlperf storage v0.5.1 by @zhenghh04 in #60
- Added support for Dali data loader by @hariharan-devarajan in #49
- Changed datatype to be np.uint8 universally in the call by @zhenghh04 in #61
- Adding support for training on a subset of dataset by @zhenghh04 in #63
- DLIO profiler integration by @hariharan-devarajan in #62
- Added Support Power9PC by @hariharan-devarajan in #65
- Update unet3d.yaml to correct the sample size for unet3d by @zhenghh04 in #68
- For X86 and AMD machines, we can create a pip based dlio installations by @hariharan-devarajan in #66
- Added validation to check enough core available for reading by @hariharan-devarajan in #73
- Added custom plugin code for custom data loader and reader. by @hariharan-devarajan in #74
- Changes required within DLIO Benchmark for creating a pip wheel by @hariharan-devarajan in #77
- Update bert.yaml to be consistent with mlperf storage by @zhenghh04 in #79
- Fixing subfolder issues and added subset tests by @zhenghh04 in #82
- Documentation: Instructions to compile and run on Lassen machine. by @OlgaKogiou in #85
- Changes to improve documentation by @hariharan-devarajan in #89
- Fixed dali data loader execution. by @hariharan-devarajan in #91
- Enhancing Dali data loader support by @zhenghh04 in #94
- Fixing Dali Data loader Parallelism and Pipelining. by @hariharan-devarajan in #93
- Update typo which gives issue for pytorch 1.3.1 by @hariharan-devarajan in #103
- Added documentation for the JPEG generator issue by @kaushikvelusamy in #100
- Workloads by @zhenghh04 in #97
- Added Info logging for profiler and removed unnecessary bracket calls. by @hariharan-devarajan in #104
- Fix the data dir path by @hariharan-devarajan in #108
- Making DLIO Profiler default for dlio_benchmark. by @hariharan-devarajan in #111
- Adding dlp logger. by @hariharan-devarajan in #109
- Workloads by @zhenghh04 in #112
- fixed readthedoc build issue by @zhenghh04 in #115
- fix Docker file to use venv. by @hariharan-devarajan in #119
- Switch dlio_profiler to use pypi instead of github by @hariharan-devarajan in #120
- Added force install for profiler for avoiding caching issues by @hariharan-devarajan in #123
- Update README.md by @venkat-1 in #121
- torch checkpoint creation should use storage class methods by @krehm in #126
- Reducing Github actions time by @zhenghh04 in #128
- Create output_folder using os.makedirs() by @krehm in #124
- Adding Native Dali Data Loader support for TFRecord, Images, and NPZ files by @zhenghh04 in #118
- Add support for pytorch spawn and forkserver multiprocessing_context by @krehm in #129
- Reopen dlio.log in non-fork reader_threads child processes by @krehm in #130
- added checkpointing to support LLMs by @hariharan-devarajan in #114
- added dlp for spawned workers pytorch by @hariharan-devarajan in #136
- Fix MPI finalization. by @hariharan-devarajan in #139
- Adding dlio_profiler to requirements.txt by @johnugeorge in #144
- Fix dataloader initialization to only happen once. Not on every epoch. by @hariharan-devarajan in #143
- Fix random sampling pytorch non-determinism. by @hariharan-devarajan in #145
- Fixed printing for DLIO output. by @hariharan-devarajan in #142
- Doc changes to fix DLIO profiler and remove IOStat by @hariharan-devarajan in #146
- Support for custom checkpointing. by @hariharan-devarajan in #137
- Feature/parallel io generator by @hariharan-devarajan in #148
- fix random bugs and printing by @hariharan-devarajan in #147
- Release for v2.0 by @zhenghh04 in #113
- Fix requirements file by @johnugeorge in #150
- fixed sample distribution bugs by @zhenghh04 in #152
- Fix sample shuffling by @hariharan-devarajan in #154
- Optimization to sample distribution by @TheAssembler1 in #156
- DALI data loader fix and configuration files update for new batch sizes by @zhenghh04 in #158
- Fixing github action issues by @zhenghh04 in #162
- Fixing github action issues (#162) by @zhenghh04 in #163
- Fixed random samples issue and added more github actions to test the configuration files by @zhenghh04 in #164
- Various bug fixes by @zhenghh04 in #166
- Fixed global_index issue and redundant shuffling in DALI by @zhenghh04 in #168
- merge main by @zhenghh04 in #172
- Adding support to include host cpu and memory info into the json files by @zhenghh04 in #174
- Changed from PyTorch to Tensorflow for ResNet50 and CosmoFlow by @zhenghh04 in #183
- Fixing action failure issue by @zhenghh04 in #184
- Fixed Performance issue in TF Data loader by @zhenghh04 in #185
- Merge from main by @zhenghh04 in #186
- Synthetic data support by @zhenghh04 in #188
- Added doc for synthetic data loader and data reader by @zhenghh04 in #189
- Packaging by @zhenghh04 in #190
- Packaging by @zhenghh04 in #191
- generating indexed_binary files causes kernel OOM to kill process (#181) by @krehm in #182
- reduced tensorflow version by @zhenghh04 in #192
- Improve tfreader parsing performance (batch) by @LouisDDN in #194
- Update config.py by @zhenghh04 in #196
- Shard filenames instead of images (tfreader) by @LouisDDN in #197
- Request changes from MLPerf Storage by @zhenghh04 in #199
- Fixed potential insufficient samples due to num_files is not divisible by comm.size by @zhenghh04 in #200
- Mlperf requests by @zhenghh04 in #201
- sync up mlperf_storage_v1.0 by @zhenghh04 in #203
- Fix requirements file by @johnugeorge in #204
- Mlperf storage v1.0 by @zhenghh04 in #206
- Fixed the MPI initialization issue by @zhenghh04 in #207
- Switch DLIO Profiler to DFTracer. by @hariharan-devarajan in #208
- Fix README CI badge by @izzet in #212
- Adding version fix restricts matching on python 3.9 environment. by @hariharan-devarajan in #218
- Only intialize and finalize on DLIOMPI by @hariharan-devarajan in #214
- Ignore file indexing for native data loader. by @hariharan-devarajan in #215
New Contributors
- @johnugeorge made their first contribution in #8
- @hariharan-devarajan made their first contribution in #37
- @OlgaKogiou made their first contribution in #85
- @kaushikvelusamy made their first contribution in #100
- @venkat-1 made their first contribution in #121
- @krehm made their first contribution in #126
- @TheAssembler1 made their first contribution in #156
- @LouisDDN made their first contribution in #194
Full Changelog: v1.0.0...v2.0.0