Some improvements in readability for some of the notebooks (#1)

* precise version for pip packages. And display full numbers in the heatmap. * adding stuff to display file and location * make the experiment notebook create datestamped versions of the model file * display files after they've been uploaded to s3 * latest changes --------- Co-authored-by: Christopher Chase <[email protected]>
rh-aiservices-bu · Oct 31, 2023 · 82e1176 · 82e1176
1 parent d46c35a
commit 82e1176
Show file tree

Hide file tree

Showing 2 changed files with 172 additions and 59 deletions.
diff --git a/1_experiment_train.ipynb b/1_experiment_train.ipynb
@@ -9,12 +9,10 @@
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "### Install Python Dependencies"
-   ],
-   "metadata": {
-    "collapsed": false
-   }
+   ]
   },
   {
    "cell_type": "code",
@@ -24,17 +22,18 @@
    },
    "outputs": [],
    "source": [
-    "!pip install onnx onnxruntime seaborn tf2onnx"
+    "!pip install onnx==1.12.0 \\\n",
+    "        onnxruntime==1.16.1 \\\n",
+    "        seaborn==0.13.0 \\\n",
+    "        tf2onnx==1.13.0"
    ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Now we can import those dependencies we need to run the code"
-   ],
-   "metadata": {
-    "collapsed": false
-   }
+   ]
   },
   {
    "cell_type": "code",
@@ -44,6 +43,7 @@
    "source": [
     "import numpy as np\n",
     "import pandas as pd\n",
+    "import datetime\n",
     "from keras.models import Sequential\n",
     "from keras.layers import Dense, Dropout, BatchNormalization, Activation\n",
     "from sklearn.model_selection import train_test_split\n",
@@ -57,6 +57,7 @@
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Load the CSV data which we will use to train the model.\n",
     "It contains the following fields:\n",
@@ -68,10 +69,7 @@
     "* **usedpinnumber** - If the PIN number was used.\n",
     "* **online_order** - If it was an online order.\n",
     "* **fraud** - If the transaction is fraudulent."
-   ],
-   "metadata": {
-    "collapsed": false
-   }
+   ]
   },
   {
    "cell_type": "code",
@@ -123,14 +121,12 @@
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "### Build the model\n",
     "\n",
     "The model we build here is a simple fully connected deep neural network, containing 3 hidden layers and one output layer."
-   ],
-   "metadata": {
-    "collapsed": false
-   }
+   ]
   },
   {
    "cell_type": "code",
@@ -174,14 +170,89 @@
     "history = model.fit(X_train, y_train, epochs=epochs, \\\n",
     "                    validation_data=(scaler.transform(X_val.values),y_val), \\\n",
     "                    verbose = True, class_weight = class_weights)\n",
-    "\n",
+    "print(\"Training of model is complete\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
     "# Save the model as ONNX for easy use of ModelMesh\n",
-    "\n",
     "model_proto, _ = tf2onnx.convert.from_keras(model)\n",
     "os.makedirs(\"models/fraud\", exist_ok=True)\n",
     "onnx.save(model_proto, \"models/fraud/model.onnx\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Save the model file"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* Confirm the model file has been created successfully\n",
+    "* This should display the model file, with its size and date. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! ls -alRh ./models/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let's also create a date-stamped folder as well"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a date-stamped folder for fraud models\n",
+    "current_date = datetime.datetime.now().strftime(\"%Y%m%d-%H%M%S\")\n",
+    "fraud_folder = os.path.join(\"models/\", current_date + \"-fraud\")\n",
+    "os.makedirs(fraud_folder, exist_ok=True)\n",
+    "\n",
+    "# Save the model to the date-stamped folder\n",
+    "model_path = os.path.join(fraud_folder, \"model.onnx\")\n",
+    "onnx.save(model_proto, model_path)\n",
+    "\n",
+    "print(f\"Saved the model to {model_path}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "* Confirm the model file has been created successfully\n",
+    "* This should display the model file(s), with its size and date. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! ls -alh ./models/"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -205,12 +276,10 @@
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Load the test data and scaler"
-   ],
-   "metadata": {
-    "collapsed": false
-   }
+   ]
   },
   {
    "cell_type": "code",
@@ -226,12 +295,10 @@
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Create a onnx inference runtime session and predict values for all test inputs"
-   ],
-   "metadata": {
-    "collapsed": false
-   }
+   ]
   },
   {
    "cell_type": "code",
@@ -250,12 +317,10 @@
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Show results"
-   ],
-   "metadata": {
-    "collapsed": false
-   }
+   ]
   },
   {
    "cell_type": "code",
@@ -267,7 +332,7 @@
     "print(\"Accuracy: \" + str(accuracy))\n",
     "\n",
     "c_matrix = confusion_matrix(y_test,y_pred)\n",
-    "ax = sns.heatmap(c_matrix, annot=True, cbar=False, cmap='Blues')\n",
+    "ax = sns.heatmap(c_matrix, annot=True,fmt='d', cbar=False, cmap='Blues')\n",
     "ax.set_xlabel(\"Prediction\")\n",
     "ax.set_ylabel(\"Actual\")\n",
     "ax.set_title('Confusion Matrix')\n",
@@ -276,25 +341,39 @@
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "Trying with Sally's details\n",
-    "Fields are in order: distance_from_last_transaction, ratio_to_median_price, used_chip, used_pin_number, online_order "
-   ],
-   "metadata": {
-    "collapsed": false
-   }
+    "\n",
+    "Fields are in order: \n",
+    "* distance_from_last_transaction\n",
+    "* ratio_to_median_price\n",
+    "* used_chip \n",
+    "* used_pin_number\n",
+    "* online_order "
+   ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "sally_transaction_details = [[0.3111400080477545, 1.9459399775518593, 1.0, 0.0, 0.0]]\n",
+    "sally_transaction_details = [\n",
+    "    [0.3111400080477545,\n",
+    "    1.9459399775518593, \n",
+    "    1.0, \n",
+    "    0.0, \n",
+    "    0.0]\n",
+    "    ]\n",
+    "\n",
     "prediction = sess.run([output_name], {input_name: scaler.transform(sally_transaction_details).astype(np.float32)})\n",
     "\n",
     "print(\"Was Sally's transaction predicted to be fraudulent?  \")\n",
-    "print(np.squeeze(prediction) > threshold)"
+    "print(np.squeeze(prediction) > threshold)\n",
+    "\n",
+    "print(\"How likely to be fraudulent was it?  \")\n",
+    "print(\"{:.5f}\".format(np.squeeze(prediction)) + \"%\")"
    ]
   }
  ],

diff --git a/2_save_model.ipynb b/2_save_model.ipynb
@@ -9,13 +9,21 @@
     "To save this model to use from various locations, including other notebooks or serving the model, we need to upload it to s3 compatible storage."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Install required packages and define a function for the upload"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "!pip install boto3 botocore"
+    "!pip install boto3==1.26.165 \\\n",
+    "            botocore==1.29.165"
    ]
   },
   {
@@ -53,7 +61,31 @@
     "            relative_path = os.path.relpath(file_path, local_directory)\n",
     "            s3_key = os.path.join(s3_prefix, relative_path)\n",
     "            print(f\"{file_path} -> {s3_key}\")\n",
-    "            bucket.upload_file(file_path, s3_key)"
+    "            bucket.upload_file(file_path, s3_key)\n",
+    "\n",
+    "def list_objects(prefix):\n",
+    "    filter = bucket.objects.filter(Prefix=prefix)\n",
+    "    for obj in filter.all():\n",
+    "        print(obj.key)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## List files\n",
+    "\n",
+    "List in your S3 bucket under the upload prefix `models` to make sure upload was successful.  As a best practice, we'll only keep 1 model in a given prefix or directory.  There are several models that will need multiple files in directory and we can download and serve the directory with all necessary files this way without mixing up our models.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* If this is the first time running the code, this cell will have no output:  \n",
+    "* But if you've already uploaded your model, you should see: `models/fraud/model.onnx`"
    ]
   },
   {
@@ -62,16 +94,21 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "upload_directory_to_s3(\"models\", \"models\")"
+    "list_objects(\"models\")"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### List files\n",
-    "\n",
-    "List in your S3 bucket under the upload prefix `models` to make sure upload was successful.  As a best practice, we'll only keep 1 model in a given prefix or directory.  There are several models that will need multiple files in directory and we can download and serve the directory with all necessary files this way without mixing up our models.\n"
+    "## Upload and check again"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And now, we use this function to upload the `models` folder in a rescursive fashion"
    ]
   },
   {
@@ -80,10 +117,16 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "def list_objects(prefix):\n",
-    "    filter = bucket.objects.filter(Prefix=prefix)\n",
-    "    for obj in filter.all():\n",
-    "        print(obj.key)"
+    "upload_directory_to_s3(\"models\", \"models\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "To confirm this worked, we run the `list_objects` function again:"
    ]
   },
   {
@@ -103,17 +146,8 @@
     "\n",
     "Hopefully, you saw the model `models/fraud/model.onnx` listed above. Now that you've saved the model so s3 storage we can refer to the model using the same data connection to serve the model as an API.\n",
     "\n",
-    "Return to the workshop to deploy the model as an API.\n"
+    "Return to the workshop instructions to deploy the model as an API.\n"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {