Skip to content

Commit

Permalink
Merge pull request #66 from aisingapore/dev
Browse files Browse the repository at this point in the history
0.5.1 release part 2
  • Loading branch information
Syakyr authored Feb 23, 2025
2 parents a565cc0 + 9eb7cd4 commit 2016d78
Show file tree
Hide file tree
Showing 4 changed files with 62 additions and 40 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -127,15 +127,3 @@ if you intend to use Jupyter notebooks within the VSCode environment.
[vsx-python]: https://marketplace.visualstudio.com/items?itemName=ms-python.python
[vsx-jy]: https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter
[jy-vscode]: ./04c-virtual-env.md#jupyter-kernel-for-vscode

## Using Docker within Kubernetes

!!! caution
Since these development environments are essentially pods deployed
within a Kubernetes cluster, using Docker within the pods
themselves is not feasible by default and while possible, should
be avoided.

??? info "Reference Link(s)"

- [Using Docker-in-Docker for your CI or testing environment? Think twice. - jpetazzo](https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/)
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,10 @@ values can be overridden through the CLI.

## Data Preparation & Preprocessing

To process the sample raw data, there are many ways to do so. One way
is to submit the job through Run:ai. You can first your configuration
variables at `conf/process_data.yaml`, specifically this section:
To process the sample raw data, there are many ways to do so. We can
either build within the Coder workspace, or to submit the job through
Run:ai. You can first your configuration variables at
`conf/process_data.yaml`, specifically this section:

```yaml
raw_data_dir_path: "./data/raw"
Expand All @@ -53,6 +54,23 @@ provided in this template:

=== "Coder Workspace Terminal"

```bash
docker build \
-t {{cookiecutter.registry_project_path}}/cpu:0.1.0 \
-f $(pwd)/docker/{{cookiecutter.repo_name}}-cpu.Dockerfile \
$(pwd)
{%- if cookiecutter.platform == 'gcp' %}
# Run `gcloud auth activate-service-account --key-file $GOOGLE_APPLICATION_CREDENTIALS`
# and `gcloud auth configure-docker {{cookiecutter.registry_project_path.split('/')[0]}}`
{%- elif cookiecutter.platform == 'onprem' %}
# Run `docker login {{cookiecutter.registry_project_path.split('/')[0]}}`
{%- endif %}
# to authenticate if you have not done so
docker push {{cookiecutter.registry_project_path}}/cpu:0.1.0
```

=== "Using Run:ai"

```bash
# Run `runai login` and `runai config project {{cookiecutter.proj_name}}` first if needed
# Run this in the base of your project repository, and change accordingly
Expand All @@ -70,7 +88,7 @@ provided in this template:
Now that we have the Docker image built and pushed to the registry, we
can submit a job using that image to Run:ai\:

=== "Coder Workspace Terminal"
=== "Coder Workspace Terminal using Run:ai"

```bash
# Switch working-dir to /<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/{{cookiecutter.repo_name}} to use the repo in the PVC
Expand Down Expand Up @@ -156,6 +174,23 @@ After that, we build the Docker image from the Docker file

=== "Coder Workspace Terminal"

```bash
docker build \
-t {{cookiecutter.registry_project_path}}/gpu:0.1.0 \
-f $(pwd)/docker/{{cookiecutter.repo_name}}-gpu.Dockerfile \
$(pwd)
{%- if cookiecutter.platform == 'gcp' %}
# Run `gcloud auth activate-service-account --key-file $GOOGLE_APPLICATION_CREDENTIALS`
# and `gcloud auth configure-docker {{cookiecutter.registry_project_path.split('/')[0]}}`
{%- elif cookiecutter.platform == 'onprem' %}
# Run `docker login {{cookiecutter.registry_project_path.split('/')[0]}}`
{%- endif %}
# to authenticate if you have not done so
docker push {{cookiecutter.registry_project_path}}/gpu:0.1.0
```

=== "Using Run:ai"

```bash
# Run `runai login` and `runai config project {{cookiecutter.proj_name}}` first if needed
# Run this in the base of your project repository, and change accordingly
Expand All @@ -173,7 +208,7 @@ After that, we build the Docker image from the Docker file
Now that we have the Docker image built and pushed to the registry,
we can run a job using it:

=== "Coder Workspace Terminal"
=== "Coder Workspace Terminal using Run:ai"

```bash
# Switch working-dir to /<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/{{cookiecutter.repo_name}} to use the repo in the PVC
Expand Down Expand Up @@ -320,7 +355,7 @@ executing the model training job out of the Run:ai platform, as the
`JOB_NAME` and `JOB_UUID` environment variables would not be available
by default.

=== "Coder Workspace Terminal"
=== "Coder Workspace Terminal using Run:ai"

```bash
# Switch working-dir to /<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/{{cookiecutter.repo_name}} to use the repo in the PVC
Expand Down
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
--- {{cookiecutter.repo_name}}/aisg-context/guide-site/docs/runai/06c-job-orchestration.md
+++ {{cookiecutter.repo_name}}/problem-templates/cv/aisg-context/guide-site/docs/runai/06c-job-orchestration.md
@@ -46,2 +46,2 @@
@@ -47,2 +47,2 @@
-raw_data_dir_path: "./data/raw"
-processed_data_dir_path: "./data/processed"
+raw_data_dir_path: "./data/mnist-pngs-data-aisg"
+processed_data_dir_path: "./data/processed/mnist-pngs-data-aisg-processed"
@@ -84,2 +84,2 @@
@@ -102,2 +102,2 @@
- raw_data_dir_path=/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/data/raw \
- processed_data_dir_path=/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/data/processed"
+ raw_data_dir_path=/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/data/mnist-pngs-data-aisg \
+ processed_data_dir_path=/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/data/processed/mnist-pngs-data-aisg-processed"
@@ -91 +91 @@
@@ -109 +109 @@
-`/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/data/processed`.
+`/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/data/processed/mnist-pngs-data-aisg-processed`.
@@ -121,0 +122,4 @@
@@ -139,0 +140,4 @@
+ !!! note
+ The username and password for the MLflow Tracking server
+ can be retrieved from the MLOps team or your team lead.
+
@@ -143,3 +147,13 @@
@@ -161,3 +165,13 @@
-data_dir_path: "./data/processed"
-dummy_param1: 1.3
-dummy_param2: 0.8
Expand All @@ -35,7 +35,7 @@
+dry_run: false
+model_checkpoint_interval: 2
+model_checkpoint_dir_path: "./models/checkpoint"
@@ -190,3 +204,6 @@
@@ -225,3 +239,6 @@
- data_dir_path=/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/data/processed \
- artifact_dir_path=/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/models \
- mlflow_tracking_uri=<MLFLOW_TRACKING_URI>"
Expand All @@ -45,17 +45,17 @@
+ mlflow_exp_name=<NAME_OF_DEFAULT_MLFLOW_EXPERIMENT> \
+ model_checkpoint_dir_path=/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/{{cookiecutter.repo_name}}/models \
+ epochs=3"
@@ -265,2 +282,2 @@
@@ -300,2 +317,2 @@
- dummy_param1: range(0.9,1.7,step=0.1)
- dummy_param2: choice(0.7,0.8,0.9)
+ lr: range(0.9,1.7,step=0.1)
+ gamma: choice(0.7,0.8,0.9)
@@ -293 +310 @@
@@ -328 +345 @@
- return args["dummy_param1"], args["dummy_param2"]
+ return curr_test_loss, curr_test_accuracy
@@ -335,0 +353 @@
@@ -370,0 +388 @@
+ -e OMP_NUM_THREADS=2 \
@@ -337,3 +355,6 @@
@@ -372,3 +390,6 @@
- data_dir_path=/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/data/processed \
- artifact_dir_path=/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/models \
- mlflow_tracking_uri=<MLFLOW_TRACKING_URI>"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
--- {{cookiecutter.repo_name}}/aisg-context/guide-site/docs/runai/06c-job-orchestration.md
+++ {{cookiecutter.repo_name}}/problem-templates/hdb/aisg-context/guide-site/docs/runai/06c-job-orchestration.md
@@ -49,0 +49,3 @@
@@ -50,0 +51,2 @@
+There are other configurables in the `conf/process_data.yaml` which are used
+in data preparation scripts found in `src/{{cookiecutter.src_package_name}}/data_prep`.
+
@@ -144,2 +147,7 @@
@@ -162,2 +164,7 @@
-dummy_param1: 1.3
-dummy_param2: 0.8
+artifact_dir_path: "./models"
Expand All @@ -14,38 +13,38 @@
+gamma: 1
+max_depth: 5
+seed: 1111
@@ -191,2 +199,5 @@
@@ -226,2 +233,5 @@
- artifact_dir_path=/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/models \
- mlflow_tracking_uri=<MLFLOW_TRACKING_URI>"
+ setup_mlflow=true \
+ mlflow_tracking_uri=<MLFLOW_TRACKING_URI> \
+ mlflow_exp_name=<NAME_OF_DEFAULT_MLFLOW_EXPERIMENT> \
+ model_checkpoint_dir_path=/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/{{cookiecutter.repo_name}}/models \
+ epochs=3"
@@ -259,2 +270,2 @@
@@ -294,2 +304,2 @@
- direction: ["minimize", "maximize"]
- study_name: "image-classification"
+ direction: ["minimize"]
+ study_name: "hdb-resale-process"
@@ -265,2 +276,4 @@
@@ -300,2 +310,4 @@
- dummy_param1: range(0.9,1.7,step=0.1)
- dummy_param2: choice(0.7,0.8,0.9)
+ n_estimators: range(50, 200, step=10)
+ lr: tag(log, interval(0.1, 0.6))
+ gamma: choice(0,0.1,0.2,0.3,0.4,0.5)
+ max_depth: range(2,20,step=1)
@@ -293 +306 @@
@@ -328 +340 @@
- return args["dummy_param1"], args["dummy_param2"]
+ return test_rmse ## or any other metrics
@@ -300 +313 @@
@@ -335 +347 @@
- direction: ["minimize", "maximize"]
+ direction: ["minimize"] ## or ["maximise"], if you're looking to maximise the test_rmse value
@@ -306 +319 @@
-loss and maximise the accuracy. The `hydra.sweeper.direction` field in
@@ -341 +353 @@
-loss and maximise the accuracy. The `hydra.sweeper.direction` field in
+root mean square error. The `hydra.sweeper.direction` field in
@@ -335,0 +348 @@
@@ -370,0 +383 @@
+ -e OMP_NUM_THREADS=2 \
@@ -338,2 +352,5 @@
@@ -373,2 +386,5 @@
- artifact_dir_path=/<NAME_OF_DATA_SOURCE>/workspaces/<YOUR_HYPHENATED_NAME>/models \
- mlflow_tracking_uri=<MLFLOW_TRACKING_URI>"
+ setup_mlflow=true \
Expand Down

0 comments on commit 2016d78

Please sign in to comment.