Scalability report update 2 #229

* UPDATE: Remove experimental distribution * ADD: Mnist distributed * ADD: Optional strategy * UPDATE: Conditional distribution * FIX: Dataloader for mnist * FIX: Model cloning lambda function for distributed scope * ADD: CycleGAN * UPDATE: Types * UPDATE: Types * ADD: Local distr * FIX: learning rates * ADD: CycleGAN distributed * FIX: Reduction * FIX: Distribution * ADD: tmp.py * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * UPDATE: Executors * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD:Initial VIRGO * UPDATE: Optional distribution, tensorflow-gpu * UPDATE: tensorflow-gpu dependency * ADD: Unify branches --------- Co-authored-by: User3574 <[email protected]>

* ADD: lightning distributed + pipeline * UPDATE: jscpd threshold * UPDATE: super linter ignore use cases * ADD: jscpd ignore loggers

* ADD: use case tests * FIX: move use case models out of itwinai * FIX: rearrange modules * ADD: ConsoleLogger and LoggersCollection * FIX: loggers filter * FIX: add TF env creation * UPDATE: test flag * ADD: early pytest on slurm * FIX: duplicated code in TF Trainer

* Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]>

* Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]>

* commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]>

* ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml * Update sqaaas.yml * ADD: adaptive branch discovery for SQAaaS actin * Trigger only on main and dev branches * ADD: double quote * Trigger pytest only on main and dev PRs

* ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * Remove keras dependency

* commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]>

* Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2

* commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]>

* fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]>

* Define a step for pytest execution * Fix: use v1 of step action * Print result of step composition * Rename step * Use step previous definition in the assessment * Rename input: workflow -> steps * Avoid caching by using 1.0.0 * Set container image * Bump to v1 * Bump to sqaaas-assessment-action@v2 * Remove 'id' property * Adapt inputs to v2 * Remove current branch * Disable test_cyclones_train_tf * ADD marker * ADD skip memory heavy * Disable for PRs --------- Co-authored-by: Matteo Bunino <[email protected]>

* ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]>

Update ParseConfig

Remove experimental files

* commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]>

* ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]>

* fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path * ADD offloading of 3DGAN training * ADAPT 3DGAN training for singularity execution * UPDATE test and fix linter --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]>

* commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem * UPDATE requirements --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]>

* ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Update distributed.py * Update tfmirrored_slurm.sh * Update train.py * TF updates * Add README * Python venv (#136) * Move to python venv * Update Makefile * Add Horovod installation * Update env * FIX openmpi install * Add TF explicit version * UPDATE env creation * REMOVE constraint on torch 2.0.* * UPDATE installation * FIX test * REMOVE strict dependency on micromamba * FIX docs and debugging states * FIX cpu only installation * FIX deepspeed cpu installation * FIX tf env creation * FIX makefile * ADD pypi deployment * DISABLE push debug * UPDATE pypi * UPDATE classifiers * Update pyproject.toml --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]>

* ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Update distributed.py * Update tfmirrored_slurm.sh * Update train.py * TF updates * Add README * Python venv (#136) * Move to python venv * Update Makefile * Add Horovod installation * Update env * FIX openmpi install * Add TF explicit version * UPDATE env creation * REMOVE constraint on torch 2.0.* * UPDATE installation * FIX test * REMOVE strict dependency on micromamba * FIX docs and debugging states * FIX cpu only installation * FIX deepspeed cpu installation * FIX tf env creation * FIX makefile * ADD pypi deployment * DISABLE push debug * UPDATE pypi * UPDATE classifiers * Update pyproject.toml * Update README.md * Cyclone tf dist (#130) * get_stretegy * UPDATE distributed strategy * change req file * cycline tf dist * small bugs * fix bug in train.py * REFACTOR cyclones use case * Activate pytest * NEW TensorFlow trainer * ADD user information --------- Co-authored-by: ruettgers1 <[email protected]> Co-authored-by: Matteo Bunino <[email protected]> * Interactive distrib ml (#139) Add examples for distributed ml in interactive mode * Interactive distrib ml (#140) Update tutorial * Disable documentation GH action * Remove action --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: MarioRuettgers <[email protected]>

Bring changes on main into dev

* ADD Virgo data pipeline and some refactoring * FIX typo * UPDATE README * ADD training * ADD TrainingConfiguration * ADD distributed training and refactor * update readme * UPDATE loggers and add tests * Refactor * FIX typo * UPDATE use cases instructions * ADD checkpointing and refactor. * FIX linter * FIX jscpd * FIX jscpd * Disable jscpd * Refactor loggers * ADD loggers to Virgo use case

* commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem * UPDATE requirements * Remove unnecessary dependencies * Add docstring * adding latest changes from dev * new content and changes * Update index.rst toctree revise * adding pages for distributed ml tutorials * new shpinx reqs to solve build failing * Docs update: - python code format fixed - added brief explanation on ddp in new section * requirements changed * UPDATE requirements * UPDATE requirements and itwinai.types * ADD CMake and GCC installation * UPDATE CMake and GCC installation * UPDATE CMake and GCC installation * ADD notebooks * Disable notebooks section * FIX TOC * Saving local changes before pulling from remote * saving updates before pull from origin * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * adding cyclones and virgo use cases pages * FIX build errors * Update TOC * Update TOC --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> Co-authored-by: Killian Verder <[email protected]>

* Dev - itwinai 0.0.2 (#138) * Backend (#59) * WIP: Tensorflow MNIST use-case * UPDATE: Tensorflow MNIST version * ADD: Backend * ADD: Use-case init * FIX: Paths and downloading of the data * FIX: Paths and downloading of the data * ADD: Setup, Config update * ADD: Setup, Config update * UPDATE: File movement into itwinai * FIX: Move utils from tensorflow to global folder * FIX: Add setup into torch Executable * ADD: MNIST Torch Use-case * FIX: Formatting * ADD: Lib * ADD: Lib * ADD: Tests, Fix Loggers * Update README.md * ADD: Tests * ADD: MLCC * ADD: Cyclones, Cyclones-pipe * ADD: TensorflowTrainer * UPDATE: Move TensorflowTrainer into Backend * FIX: Dependencies * ADD: Number of devices * ADD: initial version of TorchTrainer * update * update * ADD: distributed torch Trainer and decorator * ADD: New version of torch distribtued trainer and tests * ADD: load torch dist trainer form config file * ADD: multi-gpu pytorch trainer * ADD: download on login node * FIX: dataloaders in Trainer * FIX: add dataloaders into trainer * FIX: clear load and save state * ADD: Loggers * FIX: Log in a distributed environment * TensorFlow backend (#63) * UPDATE: Remove experimental distribution * ADD: Mnist distributed * ADD: Optional strategy * UPDATE: Conditional distribution * FIX: Dataloader for mnist * FIX: Model cloning lambda function for distributed scope * ADD: CycleGAN * UPDATE: Types * UPDATE: Types * ADD: Local distr * FIX: learning rates * ADD: CycleGAN distributed * FIX: Reduction * FIX: Distribution * ADD: tmp.py * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * UPDATE: Executors * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD:Initial VIRGO * UPDATE: Optional distribution, tensorflow-gpu * UPDATE: tensorflow-gpu dependency * ADD: Unify branches --------- Co-authored-by: User3574 <[email protected]> * Refacto entire code base * ADD: workflows folder * FIX: refactor * FIX: linting * ADD: how to run use case doc * ADD: workflows doc * FIX: MD linter * Pipe MNIST lightning (#86) * ADD: lightning distributed + pipeline * UPDATE: jscpd threshold * UPDATE: super linter ignore use cases * ADD: jscpd ignore loggers * Functional tests for MNIST (#87) * ADD: use case tests * FIX: move use case models out of itwinai * FIX: rearrange modules * ADD: ConsoleLogger and LoggersCollection * FIX: loggers filter * FIX: add TF env creation * UPDATE: test flag * ADD: early pytest on slurm * FIX: duplicated code in TF Trainer * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * 3dgan use case (#94) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Sqaaas code (#96) * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml * Update sqaaas.yml * ADD: adaptive branch discovery for SQAaaS actin * Trigger only on main and dev branches * ADD: double quote * Trigger pytest only on main and dev PRs * Torch mnist inference (#95) * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * Remove keras dependency * 3dgan integration (#97) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * 3dgan integration (#98) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * fixed distributed trainer in cyclones use case * 3dgan integration (#118) * fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Unit test 4 dev (#113) * Define a step for pytest execution * Fix: use v1 of step action * Print result of step composition * Rename step * Use step previous definition in the assessment * Rename input: workflow -> steps * Avoid caching by using 1.0.0 * Set container image * Bump to v1 * Bump to sqaaas-assessment-action@v2 * Remove 'id' property * Adapt inputs to v2 * Remove current branch * Disable test_cyclones_train_tf * ADD marker * ADD skip memory heavy * Disable for PRs --------- Co-authored-by: Matteo Bunino <[email protected]> * Distributed strategy launcher (#117) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * Distributed strategy launcher (#127) Update ParseConfig * Distributed strategy launcher (#128) Remove experimental files * Docs dev (#132) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> * Distributed strategy launcher (#131) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * 3dgan integration (#134) * fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path * ADD offloading of 3DGAN training * ADAPT 3DGAN training for singularity execution * UPDATE test and fix linter --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Docs dev (#135) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem * UPDATE requirements --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> * Distributed strategy launcher (#137) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Update distributed.py * Update tfmirrored_slurm.sh * Update train.py * TF updates * Add README * Python venv (#136) * Move to python venv * Update Makefile * Add Horovod installation * Update env * FIX openmpi install * Add TF explicit version * UPDATE env creation * REMOVE constraint on torch 2.0.* * UPDATE installation * FIX test * REMOVE strict dependency on micromamba * FIX docs and debugging states * FIX cpu only installation * FIX deepspeed cpu installation * FIX tf env creation * FIX makefile * ADD pypi deployment * DISABLE push debug * UPDATE pypi * UPDATE classifiers * Update pyproject.toml --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * Update README.md * Distributed strategy launcher (#141) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Fixes to TF new version errors * Update distributed.py * Update tfmirrored_slurm.sh * Update train.py * TF updates * Add README * Python venv (#136) * Move to python venv * Update Makefile * Add Horovod installation * Update env * FIX openmpi install * Add TF explicit version * UPDATE env creation * REMOVE constraint on torch 2.0.* * UPDATE installation * FIX test * REMOVE strict dependency on micromamba * FIX docs and debugging states * FIX cpu only installation * FIX deepspeed cpu installation * FIX tf env creation * FIX makefile * ADD pypi deployment * DISABLE push debug * UPDATE pypi * UPDATE classifiers * Update pyproject.toml * Update README.md * Cyclone tf dist (#130) * get_stretegy * UPDATE distributed strategy * change req file * cycline tf dist * small bugs * fix bug in train.py * REFACTOR cyclones use case * Activate pytest * NEW TensorFlow trainer * ADD user information --------- Co-authored-by: ruettgers1 <[email protected]> Co-authored-by: Matteo Bunino <[email protected]> * Interactive distrib ml (#139) Add examples for distributed ml in interactive mode * Interactive distrib ml (#140) Update tutorial * Disable documentation GH action * Remove action --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: MarioRuettgers <[email protected]> * Merge main (#142) Bring changes on main into dev * Virgo integration (#143) * ADD Virgo data pipeline and some refactoring * FIX typo * UPDATE README * ADD training * ADD TrainingConfiguration * ADD distributed training and refactor * update readme * UPDATE loggers and add tests * Refactor * FIX typo * UPDATE use cases instructions * ADD checkpointing and refactor. * FIX linter * FIX jscpd * FIX jscpd * Disable jscpd * Refactor loggers * ADD loggers to Virgo use case * Update AUTHORS.md * Update AUTHORS.md * Docs dev (#144) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem * UPDATE requirements * Remove unnecessary dependencies * Add docstring * adding latest changes from dev * new content and changes * Update index.rst toctree revise * adding pages for distributed ml tutorials * new shpinx reqs to solve build failing * Docs update: - python code format fixed - added brief explanation on ddp in new section * requirements changed * UPDATE requirements * UPDATE requirements and itwinai.types * ADD CMake and GCC installation * UPDATE CMake and GCC installation * UPDATE CMake and GCC installation * ADD notebooks * Disable notebooks section * FIX TOC * Saving local changes before pulling from remote * saving updates before pull from origin * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * Update itwinai.torch.modules.rst * adding cyclones and virgo use cases pages * FIX build errors * Update TOC * Update TOC --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> Co-authored-by: Killian Verder <[email protected]> --------- Co-authored-by: Roman Machacek <[email protected]> Co-authored-by: linxUser3574 <[email protected]> Co-authored-by: orviz <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: VerderK <[email protected]> Co-authored-by: MarioRuettgers <[email protected]> Co-authored-by: Killian Verder <[email protected]> * Delete .github/workflows/pages.yml * ADD quick install for users (#145) * User install (#146) * ADD quick install for users * UPDATE installer * fix framework selection * UPDATE installer * Update README.md * Update README.md * Improve docstring parsing and refactor (#147) * UPDATE print patch and refactor * Cleanup * Cleanup * Cleanup * Cleanup * FIX broken import * UPDATE docs * FIX docstring parsing * Preserve ordering * Update cli.py * Update docs (#148) * Update README.md * ADD missing doctrings * Bump actions/setup-python from 4 to 5 (#149) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4 to 5. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/v4...v5) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update README.md * Update README.md * Update README.md * updating doc pages (#150) Co-authored-by: KalliopiTsolaki <ktsolaki@LAPTOP-4683QBL6> * Update cyclones_doc.rst * Bug fixes and addition of CERFACS use-case (#151) * Update train.py * Update generic_tf.sh * Update pyproject.toml * Update train.py * Fix: head problems with MacOS * Fixes for MacOS support * Fix: Update basic_components.py * Addition of cerfacs use-case * Update README.md * Update train.py * Update cyclones_doc.rst * Update startscript.sh * Update pyproject.toml * Update mnist.py * Update mnist.py * Update generic_tf.sh * Update requirements.txt * Update requirements.txt * Docs changes (#153) * updating doc pages * testing if changing the GH edit url works * adding repo link in toc --------- Co-authored-by: KalliopiTsolaki <ktsolaki@LAPTOP-4683QBL6> * Update pyproject.toml --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Roman Machacek <[email protected]> Co-authored-by: linxUser3574 <[email protected]> Co-authored-by: orviz <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: VerderK <[email protected]> Co-authored-by: MarioRuettgers <[email protected]> Co-authored-by: Killian Verder <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: KalliopiTsolaki <ktsolaki@LAPTOP-4683QBL6>

correct tqdm import error

in train_val, strategy.device changed to strategy.device()

* ADD prov4ml logger * UPDATE enum access fields * UPDATE loggers documentation and first integration attempt * ADD prov logger * format kinds table * MIGRATE to upstream prov4ml * ADD docs build on JSC * ADD RTD website * UPDATE docs creation * Refactor * UPDATE logger * Remove lightning callbacks and loggers * ADD checkpoints * UPDATE logger kind docs * Update README.md * ADD rank on loggers * Update loggers.py * Update loggers.py * Update loggers.py * Update loggers.py * Update loggers.py * FIX linter * REFACTOR loggers * Simplify prov4ml switch case * UPDATE loggers * FIX prov graph * REFACTOR itwinai logging * UPDATE SLURM jobscripts * REFACTOR * Update * ADD prov experiments * REFACTOR provenance logs and SLURM jobscripts * REMOVE duplication * FIX dataset name * UPDATE README * SKIP cyclones use case * UPDATE version * REMOVE redundant parameter * CLEANUP * ADD warning * ADD warning * UPDATE README * FIX errors * ADD docs * UPDATE scripts * UPDATE scripts

Bumps [github/super-linter](https://github.com/github/super-linter) from 6 to 7. - [Release notes](https://github.com/github/super-linter/releases) - [Changelog](https://github.com/github/super-linter/blob/main/CHANGELOG.md) - [Commits](github/super-linter@v6...v7) --- updated-dependencies: - dependency-name: github/super-linter dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Backend (#59) * WIP: Tensorflow MNIST use-case * UPDATE: Tensorflow MNIST version * ADD: Backend * ADD: Use-case init * FIX: Paths and downloading of the data * FIX: Paths and downloading of the data * ADD: Setup, Config update * ADD: Setup, Config update * UPDATE: File movement into itwinai * FIX: Move utils from tensorflow to global folder * FIX: Add setup into torch Executable * ADD: MNIST Torch Use-case * FIX: Formatting * ADD: Lib * ADD: Lib * ADD: Tests, Fix Loggers * Update README.md * ADD: Tests * ADD: MLCC * ADD: Cyclones, Cyclones-pipe * ADD: TensorflowTrainer * UPDATE: Move TensorflowTrainer into Backend * FIX: Dependencies * ADD: Number of devices * ADD: initial version of TorchTrainer * update * update * ADD: distributed torch Trainer and decorator * ADD: New version of torch distribtued trainer and tests * ADD: load torch dist trainer form config file * ADD: multi-gpu pytorch trainer * ADD: download on login node * FIX: dataloaders in Trainer * FIX: add dataloaders into trainer * FIX: clear load and save state * ADD: Loggers * FIX: Log in a distributed environment * TensorFlow backend (#63) * UPDATE: Remove experimental distribution * ADD: Mnist distributed * ADD: Optional strategy * UPDATE: Conditional distribution * FIX: Dataloader for mnist * FIX: Model cloning lambda function for distributed scope * ADD: CycleGAN * UPDATE: Types * UPDATE: Types * ADD: Local distr * FIX: learning rates * ADD: CycleGAN distributed * FIX: Reduction * FIX: Distribution * ADD: tmp.py * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * FIX: Distribution * UPDATE: Executors * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * FIX: Distributed Dataset * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD: Ray * ADD:Initial VIRGO * UPDATE: Optional distribution, tensorflow-gpu * UPDATE: tensorflow-gpu dependency * ADD: Unify branches --------- Co-authored-by: User3574 <[email protected]> * Refacto entire code base * ADD: workflows folder * FIX: refactor * FIX: linting * ADD: how to run use case doc * ADD: workflows doc * FIX: MD linter * Pipe MNIST lightning (#86) * ADD: lightning distributed + pipeline * UPDATE: jscpd threshold * UPDATE: super linter ignore use cases * ADD: jscpd ignore loggers * Functional tests for MNIST (#87) * ADD: use case tests * FIX: move use case models out of itwinai * FIX: rearrange modules * ADD: ConsoleLogger and LoggersCollection * FIX: loggers filter * FIX: add TF env creation * UPDATE: test flag * ADD: early pytest on slurm * FIX: duplicated code in TF Trainer * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * 3dgan use case (#94) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Sqaaas code (#96) * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml * Update sqaaas.yml * ADD: adaptive branch discovery for SQAaaS actin * Trigger only on main and dev branches * ADD: double quote * Trigger pytest only on main and dev PRs * Torch mnist inference (#95) * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * Remove keras dependency * 3dgan integration (#97) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * 3dgan integration (#98) * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * REMOVE: keras dependency * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test --------- Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * fixed distributed trainer in cyclones use case * 3dgan integration (#118) * fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Unit test 4 dev (#113) * Define a step for pytest execution * Fix: use v1 of step action * Print result of step composition * Rename step * Use step previous definition in the assessment * Rename input: workflow -> steps * Avoid caching by using 1.0.0 * Set container image * Bump to v1 * Bump to sqaaas-assessment-action@v2 * Remove 'id' property * Adapt inputs to v2 * Remove current branch * Disable test_cyclones_train_tf * ADD marker * ADD skip memory heavy * Disable for PRs --------- Co-authored-by: Matteo Bunino <[email protected]> * Distributed strategy launcher (#117) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * Distributed strategy launcher (#127) Update ParseConfig * Distributed strategy launcher (#128) Remove experimental files * Docs dev (#132) * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * fixed distributed trainer in cyclones use case * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * commiting docs functionality for testing deployment * adding documentation deployment relevant files * updating readthedocs.yaml * changing directory of requirements.txt * updating reqs file * commiting changes and adding pages for tutorials * adding installation instructions in docs * adding latest changes to docs * adding new pages for itwinai modules and other modifications * modified src/itwinai/torch directory name to solve namespace conflict * fixing tutorial sections * fixes in pages appearance * fixing rendering bugs * fixing pages appearance bugs * adding latest modifications * Deleted duplicate folder after renaming src/itwinai/torch * adding documentation.yml file for automatic updating on github pages * modifying documentation.yml file * updating reqs file to solve bug in deployment * testing automated docs update * updating getting started page * fixing pages and adding new content * bug fixes * fixing content rendering * latest fixes in rendering * Add version feature to docs * Update .readthedocs.yaml * fixing display structure in getting started page * new fixes similar to previous commit * Update index.rst * Update index.rst Text re-edit index * Update index.rst change 1 word * Update .readthedocs.yaml * Update .readthedocs.yaml * fixing getting started page * Text review getting_started_with_itwinai.rst * Update 3dgan_doc.rst * Update getting_started_with_itwinai.rst punctuation * Fix torch naming problem --------- Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: VerderK <[email protected]> * Distributed strategy launcher (#131) * ADD: distrib launcher mockup * REFACTOR: cluster env, strategy and launcher * ADD: Torch Elastic Launcher * ADD: info on env vars * ADD: distributed tooling and examples * new folder * UPDATE: distributed strategy setup * generalized for DDP and DS * add config file * UPDATE: kwargs * Update general_trainer.py * Update general_startscript * Update general_trainer.py * UPDATE .gitignore * Update distrib strategy * UPDATE torch distributed strategy classes * Updated docstrings * Small fixes * UPDATE docstrings * ADD deepespeed config loader * ADD first deepspeed tutorial draft * UPDATE DDP Dp distrib strategy * UPDATE horovod strategy * UPDATE tutorial on torch distributed strategies * UPDATE torch strategies tutorial * Update createEnvJSC.sh * Update hvd_slurm.sh * Update README.md * UPDATE distributed tutorial * Delete tutorials/distributed-ml/torch-ddp-deepspeed-horovod/0 * Fixes to deepspeed startscript * Update distributed.py * Update trainer.py * UPDATE tutorial * ADD draft MNIST tutorial * UPDATE DDP tutorial for MNIST * FIX small details * Update distributed.py * Added TF tutorials * Fixes to tutorials * Add files via upload * Update Makefile * Update README.md * UPDATE tutorials * UPDATE documentation and improve explainability * UPDATE SLURM scripts * FIX local rank mismatch * fixed distributed trainer in cyclones use case * UPDATE launcher * UPDATE linter * UPDATE format * FIX linter * FIX linter * Update workflow * UPDATE workflow * update * Update workflow * UPDATE super linter to v6 * UPDATE super linter to v6.3.0 * UPDATE super linter to slim * Cleanup * Update tfmirrored_slurm.sh * Update tfmirrored_slurm.sh * REMOVE workflows legacy * DELETE cyclegan use case * UPDATE dist training tutorials torch * RENAME folders with torch * DRAFT torch imagenet tutorial * UPDATE configuration * UPDATE imagenet tutorial * DRAFT scaling test * ADD scaling analysis report * FIX deepspeed micro batchsize * UPDATE data path * UPDATE checkpoint to avoid race conditions * UPDATE scalability report * UPDATE dataset path * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSC.sh * Update createEnvJSCTF.sh * Update README.md * Update README.md * JUBE benchmarks * Update createEnvJSC.sh * Update createEnvJSCTF.sh * ADD logy scale option * Extract JUBE tutorial * CLEANUP baselines * Log epoch time in real-time * FIX deepspeed dataloader for potential performances improvement * UPDATE SC bash severity * FIX deepspeed and horovod trainers * FIX some code checks * Unify redundant SLURM job scripts and configuration files * CLEANUP unused configuration * Reorg configurations * Refactor configurations and add documentation * Update README * ADD report image * Improve plot resolution * UPDATE scaling test * UPDATE launcher scripts * FIX linter * REMOVE jube tutorial * Restore ConfigParser * FIX type hinting * ADD dev dependencies * REMOVE experimental scripts * UPDATE scaling report * Add SLURM logs * Refactor log scale * Update scalability report * Unify SLURM logs per job * Update README.md * Update README.md * Update README.md * ADD itwinai installation * UPDATE torch distributed tutorial 0 * UPDATE torch distributed tutorials * REMOVE imagenet tutorial * ADD NonDistributedStrategy and create_dataloader method * CLEANUP older classes * Rename strategies * Simplify structure * ADD draft new torch trainer class * UPDATED torch trainer draft * UPDATE MNIST use case * INtegrate new trainer into MNIST use case * UPDATE structure: remove unused files and refactor tests * Tmp disable unused tests * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * Update action * FIX failing inference * Functiona tests (#133) * UPDATE tests * FIX errors * CLEANUP * Remove unused workflow --------- Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> * 3dgan integration (#134) * fixed distributed trainer in cyclones use case * commiting integration of 3dgan scripts * ADD: Download dataset * FIX: DDP distributed training with manual optimization * ADD: log with MLFlow * Sqaaas code (#88) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step --------- Co-authored-by: orviz <[email protected]> * Sqaaas code (#89) * Create sqaaas.yml * Update sqaaas.yml * Update sqaaas.yml * Point to the current repo * Remove unnecessary checkout step * Rename step * ADD: adaptive branch discovery for SQAaaS action * Update sqaaas.yml --------- Co-authored-by: orviz <[email protected]> * ADD: draft predictor and saver * ADD: stub for inference pipeline * ADD: small docs * UPDATE: inference pipeline components * UPDATE: reorg * ADD: image generation for inference * update tag * ADD: threshold * ADD: draft inference * ADD: draft inference wf * ADD: working inference workflow * ADD: 3D scatter plots * ADD: Dockerfile + refactor * ADD: .dockerignore * Update .dockerignore * ADD: skip download option * ADD: cern pipeline.yaml * UPDATE: dataset loading function * UPDATE: dataset loading function * UPDATE conf * UPDATE refactor * UPDATE refactor * UPDATE training docs * Update readme * update README * FIX typo * Update README * Update mkdir * UPDATE data paths * UPDATE Dockerfile * UPDATE Dockerfiles * UPDATE for Singularity execution * FIX version mismatch * UPDATE Singularity docs * Named steps pipe (#100) * ADD: dict steps pipe * Relax dependency constraint * UPDATE Singularity exec command * UPDATE: Image version * UPDATE: load components from pipeline * ADD: docs * Simplify 3DGAN model config * ADD: mlflow autologging support for PL trainer * UPDATE container info * Refactor * UPDATE dependencies * FIX linter problem * Simplified workflow configuration (#108) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow --------- Co-authored-by: orviz <[email protected]> * Simplified workflow configuration (#109) * Add SQAaaS dynamic badge for dev branch (#104) * Add SQAaaS dynamic badge * Upgrade to sqaaas-assessment-action@v2 * Add draft example * UPDATE credits field * ADD docs * REFACTOR components and pipeline code * UPDATE docstring * UPDATE mnist torch uc * ADD config file parser draft * ADD itwinaiCLI and ConfigParser * ADD docs * ADD pipeline parser and serializer plus tests * UPDATE docs * ADD adapter component and tests (incl parser) * ADD splitter component, improve pipeline, tests * UPDATE test * REMOVE todos * ADD component tests * ADD serializer tests * FIX linter * ADD basic workflow tutorial * ADD basic intermediate tutorial * ADD advanced tutorial * UPDATE advanced tutorial * UPDATE use cases * UPDATE save parameters * FIX linter * FIX cyclones use case workflow * ADD slurm jobscript * FIX merge error * FIX components template --------- Co-authored-by: orviz <[email protected]> * ADD integration tests * FIX test * FIX 3dgan inference test * ADD GPU support and update tag * FIX linter * ADD override example * UPDATE 3DGAN inference * UPDATE inference execution tutorials * UPDATE README * UPDATE saver saving sparse tensors * ADD interlink pods * UPDATE pod name * UPDATE annotations * FIX README * CLEANUP * Merge * update * ADD tf cpu env * U[date Makefile * FIX 3DGAN tests * FIX data folder path * ADD offloading of 3DGAN training * ADAPT 3DGAN training for singularity execution * UPDATE test and fix linter --------- Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: orviz <[email protected]> * Move to python venv * Update Makefile * Add Horovod installation * Update env * FIX openmpi install * Add TF explicit version * UPDATE env creation * REMOVE constraint on torch 2.0.* * UPDATE installation * FIX test * REMOVE strict dependency on micromamba * FIX docs and debugging states * FIX cpu only installation * FIX deepspeed cpu installation * FIX tf env creation * FIX makefile * ADD torch and tensorflow Docker containers * Working DDP * REFACTOR torch container build scripts * FIX MPI env var set * Incomplete containers * UPDATE Dockerfiles * REFACTOR Dockerfiles * Rename * UPDATE containers files and tutorial * CLEANUP old doc pages * ADD containers tutorials * ADD containers tutorials * UPDATE deps * UPDATE deps * UPDATE deps * UPDATE docs and tutorials * CLEANUP duplicates * Update tests and scripts * ADD labels * CLEANUP * Add docs and fix deepspeed launcher * UPDATE linter settings * FIX slow unit test on 3DGAN train * ADD 3dgan sample dataset --------- Co-authored-by: Roman Machacek <[email protected]> Co-authored-by: linxUser3574 <[email protected]> Co-authored-by: orviz <[email protected]> Co-authored-by: Kalliopi Tsolaki <[email protected]> Co-authored-by: zoechbauer1 <[email protected]> Co-authored-by: Mario Rüttgers <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: r-sarma <[email protected]> Co-authored-by: KalliopiTsolaki <[email protected]> Co-authored-by: VerderK <[email protected]>

…ded newer hpo scripts

…oint frequency in NoiseGeneratorTrainer

…sed old hpo files

…ve to be changed for hpo to work

…racking server

…twinai into scalability-report-update

Commits on Aug 3, 2023

update

matbun committed Aug 3, 2023

Configuration menu

View commit details

Copy full SHA for 12a595c

Browse repository at this point

Copy the full SHA

12a595c View commit details

Browse the repository at this point in the history

Commits on Aug 9, 2023

ADD: distributed torch Trainer and decorator

matbun committed Aug 9, 2023

Configuration menu

View commit details

Copy full SHA for 5fec89a

Browse repository at this point

Copy the full SHA

5fec89a View commit details

Browse the repository at this point in the history

Commits on Sep 6, 2023

ADD: multi-gpu pytorch trainer

matbun committed Sep 6, 2023

Configuration menu

View commit details

Copy full SHA for d6359dc

Browse repository at this point

Copy the full SHA

d6359dc View commit details

Browse the repository at this point in the history

Commits on Sep 8, 2023

ADD: download on login node

matbun committed Sep 8, 2023

Configuration menu

View commit details

Copy full SHA for ea63b99

Browse repository at this point

Copy the full SHA

ea63b99 View commit details

Browse the repository at this point in the history

Commits on Sep 13, 2023

FIX: clear load and save state

matbun committed Sep 13, 2023

Configuration menu

View commit details

Copy full SHA for a866338

Browse repository at this point

Copy the full SHA

a866338 View commit details

Browse the repository at this point in the history

Commits on Sep 15, 2023

ADD: Loggers

matbun committed Sep 15, 2023

Configuration menu

View commit details

Copy full SHA for 6fc3c7e

Browse repository at this point

Copy the full SHA

6fc3c7e View commit details

Browse the repository at this point in the history

Commits on Sep 26, 2023

Refacto entire code base

matbun committed Sep 26, 2023

Configuration menu

View commit details

Copy full SHA for 216537c

Browse repository at this point

Copy the full SHA

216537c View commit details

Browse the repository at this point in the history

Commits on May 8, 2024

Update README.md

matbun authored May 8, 2024

Configuration menu

View commit details

Copy full SHA for 814e755

Browse repository at this point

Copy the full SHA

814e755 View commit details

Browse the repository at this point in the history

Commits on Jun 6, 2024

Merge branch 'main' into dev

matbun committed Jun 6, 2024

Configuration menu

View commit details

Copy full SHA for 2ebac3b

Browse repository at this point

Copy the full SHA

2ebac3b View commit details

Browse the repository at this point in the history

Commits on Jun 13, 2024

Update sqaaas.yml

matbun authored Jun 13, 2024

Configuration menu

View commit details

Copy full SHA for d5bcfb3

Browse repository at this point

Copy the full SHA

d5bcfb3 View commit details

Browse the repository at this point in the history

Commits on Jul 26, 2024

Update cli.py

matbun authored Jul 26, 2024

Configuration menu

View commit details

Copy full SHA for ccb00d4

Browse repository at this point

Copy the full SHA

ccb00d4 View commit details

Browse the repository at this point in the history

Commits on Aug 9, 2024

prepared slurm script

iacopoff committed Aug 9, 2024

Configuration menu

View commit details

Copy full SHA for fe26f3c

Browse repository at this point

Copy the full SHA

fe26f3c View commit details

Browse the repository at this point in the history

Commits on Aug 27, 2024

working distributed version

iacopoff committed Aug 27, 2024

Configuration menu

View commit details

Copy full SHA for 54a8d91

Browse repository at this point

Copy the full SHA

54a8d91 View commit details

Browse the repository at this point in the history

Commits on Oct 8, 2024

Fixed Virgo dataloading

annaelisalappe committed Oct 8, 2024

Configuration menu

View commit details

Copy full SHA for 6715247

Browse repository at this point

Copy the full SHA

6715247 View commit details

Browse the repository at this point in the history

Scalability report update 2 #229

Scalability report update 2 #229

Commits on Aug 3, 2023

Commits on Aug 9, 2023

Commits on Aug 30, 2023

Commits on Sep 6, 2023

Commits on Sep 8, 2023

Commits on Sep 12, 2023

Commits on Sep 13, 2023

Commits on Sep 15, 2023

Commits on Sep 20, 2023

Commits on Sep 26, 2023

Commits on Sep 27, 2023

Commits on Sep 28, 2023

Commits on Oct 17, 2023

Commits on Oct 18, 2023

Commits on Oct 25, 2023

Commits on Oct 27, 2023

Commits on Nov 9, 2023

Commits on Nov 29, 2023

Commits on Dec 13, 2023

Commits on Mar 20, 2024

Commits on Mar 21, 2024

Commits on Mar 22, 2024

Commits on Apr 16, 2024

Commits on Apr 29, 2024

Commits on Apr 30, 2024

Commits on May 2, 2024

Commits on May 7, 2024

Commits on May 8, 2024

Commits on May 28, 2024

Commits on May 29, 2024

Commits on May 30, 2024

Commits on May 31, 2024

Commits on Jun 6, 2024

Commits on Jun 11, 2024

Commits on Jun 13, 2024

Commits on Jun 25, 2024

Commits on Jun 26, 2024

Commits on Jun 27, 2024

Commits on Jul 1, 2024

Commits on Jul 9, 2024

Commits on Jul 18, 2024

Commits on Jul 19, 2024

Commits on Jul 24, 2024

Commits on Jul 25, 2024

Commits on Jul 26, 2024

Commits on Aug 1, 2024

Commits on Aug 9, 2024

Commits on Aug 14, 2024

Commits on Aug 21, 2024

Commits on Aug 23, 2024

Commits on Aug 26, 2024

Commits on Aug 27, 2024

Commits on Aug 29, 2024

Commits on Sep 17, 2024

Commits on Sep 18, 2024

Commits on Sep 19, 2024

Commits on Sep 20, 2024

Commits on Sep 23, 2024

Commits on Sep 24, 2024

Commits on Sep 25, 2024

Commits on Sep 26, 2024

Commits on Sep 27, 2024

Commits on Oct 1, 2024

Commits on Oct 2, 2024

Commits on Oct 5, 2024

Commits on Oct 8, 2024

Commits on Oct 9, 2024

Commits on Oct 10, 2024

Commits on Oct 11, 2024

Commits on Oct 14, 2024

Commits on Oct 15, 2024

Commits on Oct 16, 2024