Skip to content

Commit

Permalink
Adding initial minndae tuning nb (#36)
Browse files Browse the repository at this point in the history
  • Loading branch information
DiogenesAnalytics committed Jan 22, 2024
1 parent 7ecdbe5 commit 52db713
Showing 1 changed file with 228 additions and 0 deletions.
228 changes: 228 additions & 0 deletions notebooks/tuning/minndae.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "8089e733-b121-4419-a641-9457f10ca989",
"metadata": {},
"source": [
"# Hyperparameter Tuning: MinNDAE\n",
"In this *Jupyter Notebook* the goal is to find the *optimal hyperparameters* for the `MinNDAE` model using the Kera's `MNIST` dataset as the baseline/standard dataset."
]
},
{
"cell_type": "markdown",
"id": "4beb03a4-6443-4952-9ab7-d713414a6679",
"metadata": {},
"source": [
"## Setup\n",
"Need to get the necessary packages ..."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "988ef4c5-55b7-45d8-ba1d-886901ad4d7c",
"metadata": {},
"outputs": [],
"source": [
"# check for colab\n",
"if \"google.colab\" in str(get_ipython()):\n",
" # install colab dependencies\n",
" !pip install git+https://github.com/DiogenesAnalytics/autoencoder"
]
},
{
"cell_type": "markdown",
"id": "1b30698e-0e05-4687-a5dc-5b237c9617c8",
"metadata": {},
"source": [
"## Get MNIST Data\n",
"Wille use `keras.datasets` to get the `MNIST` dataset, and then do some *normalizing* and *reshaping* to prepare it for use in training the *autoencoder*."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b377de50-992c-4bff-bd1b-e0ce5330ec77",
"metadata": {},
"outputs": [],
"source": [
"# get necessary libs for data/preprocessing\n",
"import tensorflow as tf\n",
"from keras.datasets import mnist\n",
"\n",
"# load the data\n",
"(x_train, _), (x_test, _) = mnist.load_data()\n",
"\n",
"# preprocess the data (normalize)\n",
"x_train = x_train.astype(\"float32\") / 255.\n",
"x_test = x_test.astype(\"float32\") / 255.\n",
"\n",
"# add grayscale dimension\n",
"x_train = tf.expand_dims(x_train, axis=-1)\n",
"x_test = tf.expand_dims(x_test, axis=-1)\n",
"\n",
"# convert to tf datasets\n",
"train_ds = tf.data.Dataset.from_tensor_slices((x_train, x_train))\n",
"test_ds = tf.data.Dataset.from_tensor_slices((x_test, x_test))\n",
"\n",
"# set a few params\n",
"BATCH_SIZE = 64\n",
"SHUFFLE_BUFFER_SIZE = 100\n",
"\n",
"# update with batch/buffer size\n",
"train_ds = train_ds.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)\n",
"test_ds = test_ds.batch(BATCH_SIZE)"
]
},
{
"cell_type": "markdown",
"id": "13bf50dc-9add-4b1b-9507-032b6c37d0d5",
"metadata": {},
"source": [
"## Building Hypermodel\n",
"Here we need to define the *function* that will be used to build the *hyper model* for the `MinNDAE` class."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "edd24794-4b11-4922-8f9e-7f41439c2221",
"metadata": {},
"outputs": [],
"source": [
"from autoencoder.model.minimal import MinNDParams, MinNDAE\n",
"from autoencoder.training import build_encode_dim_loss_function\n",
"\n",
"# set regularization factor\n",
"REG_FACTOR = 1.0 / (28.0 * 28.0)\n",
"\n",
"# define the autoencoder model\n",
"def build_autoencoder(hp):\n",
" # get encoding dimension\n",
" encode_dim = hp.Int(\"encode_dim\", min_value=1, max_value=(28 * 28), step=1)\n",
" \n",
" # get layer configs\n",
" config = MinNDParams(\n",
" l0={\"input_shape\": (28, 28, 1)},\n",
" l2={\"units\": encode_dim},\n",
" l3={\"units\": 28 * 28 * 1},\n",
" l4={\"target_shape\": (28, 28, 1)},\n",
" )\n",
"\n",
" # create model\n",
" autoencoder = MinNDAE(config)\n",
" \n",
" # get custom loss func\n",
" loss_function = build_encode_dim_loss_function(encode_dim, regularization_factor=REG_FACTOR)\n",
" \n",
" # select loss function\n",
" autoencoder.compile(optimizer=\"adam\", loss=loss_function)\n",
"\n",
" # now return keras model\n",
" return autoencoder.model"
]
},
{
"cell_type": "markdown",
"id": "3a99fb3e-0ac3-4ed9-98b3-92d135fa085d",
"metadata": {},
"source": [
"## Hyperparameter Search\n",
"Now we can begin the *hyperparameter search algorithm*."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fb32211a-c914-43b9-a4fd-e76541eae0f1",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# get hyperparam tools\n",
"from keras.callbacks import EarlyStopping\n",
"from keras_tuner import GridSearch\n",
"\n",
"# setup tuner\n",
"tuner = GridSearch(\n",
" build_autoencoder,\n",
" objective=\"val_loss\",\n",
" max_trials=50,\n",
" directory=\"autoencoder_tuning/minndae\",\n",
" project_name=f\"grid_search_encode_dim_{REG_FACTOR}_reg\",\n",
" seed=42,\n",
")\n",
"\n",
"# create early stop call backs\n",
"stop_early = EarlyStopping(monitor=\"val_loss\", patience=2)\n",
"\n",
"# generate random search space for hyperparameters\n",
"tuner.search_space_summary()\n",
"\n",
"# run the hyperparameter search\n",
"tuner.search(train_ds, epochs=10, validation_data=test_ds, callbacks=[stop_early])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5d25acd5-58e7-48a9-a4ac-a0e9c8e82b02",
"metadata": {},
"outputs": [],
"source": [
"# get hyperparams of best model\n",
"best_hp = tuner.oracle.get_best_trials(num_trials=1)[0].hyperparameters.values\n",
"print(\"Best Hyperparameters:\", best_hp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9629e788-536f-4944-9a98-1bf1d1b3549c",
"metadata": {},
"outputs": [],
"source": [
"# get plotting libs\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# extract score/encode_dims from each trial\n",
"scores, encoding_dims = zip(\n",
" *((trial.score, trial.hyperparameters[\"encode_dim\"]) for trial in tuner.oracle.trials.values())\n",
")\n",
"\n",
"# Plotting a line chart\n",
"plt.scatter(encoding_dims, scores)\n",
"plt.title(f\"Performance vs Encoding Dimension:\\n{MinNDAE.__name__} / MNIST / {REG_FACTOR:0.4f} Regularization\")\n",
"plt.axvline(x=best_hp[\"encode_dim\"], color=\"r\", linestyle=\"dashed\", linewidth=2, label=\"optimal_encode_dim\")\n",
"plt.axvline(x=32, color=\"y\", linestyle=\"dashed\", linewidth=2, label=\"keras_default\")\n",
"plt.xlabel(\"Encoding Dimension\")\n",
"plt.ylabel(\"Loss Metric\")\n",
"plt.legend()\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

0 comments on commit 52db713

Please sign in to comment.