Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: profiling example #249

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ _build
**/__pycache__
/docs/examples/**/*.diff
/docs/examples/**/slurm-*.out
/docs/examples/**/wandb/
/docs/examples/**/.pytest_cache/
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
"sphinx.ext.autosectionlabel",
"sphinx.ext.todo",
"myst_parser",
"nbsphinx",
]

templates_path = ["templates", "_templates", ".templates"]
Expand Down
1 change: 1 addition & 0 deletions docs/examples/good_practices/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ various good practices that should be observed when using the Mila cluster.

checkpointing/index
wandb_setup/index
profiling/profiling.ipynb
launch_many_jobs/index
hpo_with_orion/index
*/index
43 changes: 43 additions & 0 deletions docs/examples/good_practices/profiling/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
.. NOTE: This file is auto-generated from examples/good_practices/profiling/index.rst
.. This is done so this file can be easily viewed from the GitHub UI.
.. **DO NOT EDIT**

.. _profiling:

old_Profiling your code
=======================


**Prerequisites**
Make sure to read the following sections of the documentation before using this
example:

* `examples/frameworks/pytorch_setup <https://github.com/mila-iqia/mila-docs/tree/master/docs/examples/frameworks/pytorch_setup>`_

The full source code for this example is available on `the mila-docs GitHub
repository.
<https://github.com/mila-iqia/mila-docs/tree/master/docs/examples/good_practices/profiling>`_

.. .. toctree::
.. :maxdepth: 1

.. profiling.ipynb

.. **job.sh**

.. .. literalinclude:: job.sh
.. :language: bash


.. **main.py**

.. .. literalinclude:: main.py
.. :language: python


**Running this example**


.. code-block:: bash

$ sbatch job.sh
46 changes: 46 additions & 0 deletions docs/examples/good_practices/profiling/job.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/bin/bash
#SBATCH --gpus-per-task=rtx8000:1
#SBATCH --cpus-per-task=4
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=1
#SBATCH --mem=16G
#SBATCH --time=00:15:00


# Echo time and hostname into log
echo "Date: $(date)"
echo "Hostname: $(hostname)"

# Ensure only anaconda/3 module loaded.
module --quiet purge
module load anaconda/3
module load cuda/11.7

# default values, change if found elsewhere
VENV_DIR="$SLURM_TMPDIR/env"
IMAGENET_DIR=$SLURM_TMPDIR/imagenet

if [ ! -d "$IMAGENET_DIR" ]; then
echo "ImageNet dataset not found. Preparing dataset..."
./make_imagenet.sh
else
echo "ImageNet dataset already prepared."
fi

# Check if virtual environment exists, create it if it doesn't
if [ ! -f "$VENV_DIR/bin/activate" ]; then
echo "Virtual environment not found. Creating it."
module load python/3.10
python -m venv $VENV_DIR
source $VENV_DIR/bin/activate
pip install torch rich tqdm torchvision scipy wandb tensorboard torch-tb-profiler numpy==1.23.0
else
echo "Activating pre-existing virtual environment."
source $VENV_DIR/bin/activate
fi

# Fixes issues with MIG-ed GPUs with versions of PyTorch < 2.0
unset CUDA_VISIBLE_DEVICES

# Execute Python script
python main.py "$@"
Loading
Loading