Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Merging NCCL to Master #20

Open
wants to merge 172 commits into
base: public
Choose a base branch
from
Open

Merging NCCL to Master #20

wants to merge 172 commits into from

Conversation

jiazhihao
Copy link
Contributor

No description provided.

jiazhihao and others added 30 commits November 16, 2020 02:57
[Fusion] multiple bug fixes
[Print] update print_tensor API
[Legion] version update
jiazhihao and others added 29 commits February 5, 2021 18:28
* Added Hello World config for circleci

* Remove internal submodules

* OSS Automated Fix: Addition of Contributing

* OSS Automated Fix: Addition of Code of Conduct

* [CircleCI] Added simple onnx test for importing a model.

* [CircleCI] Added pytest job to config.yml for running onnx tests.

Co-authored-by: Zhihao Jia <[email protected]>
parallel configurations
[NCCL] eliminate MPI dependency
…e time a model takes when training (#94)

* Update README.md

* [ModelTiming] Added python script to examples folder for measuring the time per image that is used when training resnet152. Also added a file resnet.py that contains functions for generating the resnet model object

* [ModelTiming] Changed resnet_torch so it can be imported to other places.

* [ModelTiming]Changed import library for resnet
* [ModelTiming] Added python script to examples folder for measuring the time per image that is used when training resnet152. Also added a file resnet.py that contains functions for generating the resnet model object

* [ModelTiming] Changed resnet_torch so it can be imported to other places.

* [ModelTiming]Changed import library for resnet

* [ModelTiming] Added script for training resnet152 with DDP. This allows the model to be trained on multiple gpus on multiple nodes.

* [ModelTiming] Changed gpu to local_rank in the resnet152_ddp_training.py script.

* [ModelTiming] resnet152_ddp_training script now obtains the master address from the Slurm environment
w/ memory usage exceeding device capacity, a penalty will be added
to the simulated performance. This allows the MCMC search to discover
strategy that satisfy the memory constraints
[Keras_EXP]: a new keras module

Conflicts:
	python/flexflow/onnx/model.py

[Keras-EXP]: more work

[Keras-EXP]: add missing files
@facebook-github-bot
Copy link
Contributor

Hi @jiazhihao!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants