Skip to content

Commit

Permalink
Some improvements in readability for some of the notebooks (#1)
Browse files Browse the repository at this point in the history
* precise version for pip packages. And display full numbers in the heatmap.

* adding stuff to display file and location

* make the experiment notebook create datestamped versions of the model file

* display files after they've been uploaded to s3

* latest changes

---------

Co-authored-by: Christopher Chase <[email protected]>
  • Loading branch information
erwangranger and cfchase authored Oct 31, 2023
1 parent d46c35a commit 82e1176
Show file tree
Hide file tree
Showing 2 changed files with 172 additions and 59 deletions.
157 changes: 118 additions & 39 deletions 1_experiment_train.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,10 @@
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Install Python Dependencies"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
Expand All @@ -24,17 +22,18 @@
},
"outputs": [],
"source": [
"!pip install onnx onnxruntime seaborn tf2onnx"
"!pip install onnx==1.12.0 \\\n",
" onnxruntime==1.16.1 \\\n",
" seaborn==0.13.0 \\\n",
" tf2onnx==1.13.0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can import those dependencies we need to run the code"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
Expand All @@ -44,6 +43,7 @@
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import datetime\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense, Dropout, BatchNormalization, Activation\n",
"from sklearn.model_selection import train_test_split\n",
Expand All @@ -57,6 +57,7 @@
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the CSV data which we will use to train the model.\n",
"It contains the following fields:\n",
Expand All @@ -68,10 +69,7 @@
"* **usedpinnumber** - If the PIN number was used.\n",
"* **online_order** - If it was an online order.\n",
"* **fraud** - If the transaction is fraudulent."
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
Expand Down Expand Up @@ -123,14 +121,12 @@
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Build the model\n",
"\n",
"The model we build here is a simple fully connected deep neural network, containing 3 hidden layers and one output layer."
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
Expand Down Expand Up @@ -174,14 +170,89 @@
"history = model.fit(X_train, y_train, epochs=epochs, \\\n",
" validation_data=(scaler.transform(X_val.values),y_val), \\\n",
" verbose = True, class_weight = class_weights)\n",
"\n",
"print(\"Training of model is complete\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Save the model as ONNX for easy use of ModelMesh\n",
"\n",
"model_proto, _ = tf2onnx.convert.from_keras(model)\n",
"os.makedirs(\"models/fraud\", exist_ok=True)\n",
"onnx.save(model_proto, \"models/fraud/model.onnx\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Save the model file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Confirm the model file has been created successfully\n",
"* This should display the model file, with its size and date. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! ls -alRh ./models/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's also create a date-stamped folder as well"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a date-stamped folder for fraud models\n",
"current_date = datetime.datetime.now().strftime(\"%Y%m%d-%H%M%S\")\n",
"fraud_folder = os.path.join(\"models/\", current_date + \"-fraud\")\n",
"os.makedirs(fraud_folder, exist_ok=True)\n",
"\n",
"# Save the model to the date-stamped folder\n",
"model_path = os.path.join(fraud_folder, \"model.onnx\")\n",
"onnx.save(model_proto, model_path)\n",
"\n",
"print(f\"Saved the model to {model_path}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"* Confirm the model file has been created successfully\n",
"* This should display the model file(s), with its size and date. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! ls -alh ./models/"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -205,12 +276,10 @@
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the test data and scaler"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
Expand All @@ -226,12 +295,10 @@
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a onnx inference runtime session and predict values for all test inputs"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
Expand All @@ -250,12 +317,10 @@
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Show results"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
Expand All @@ -267,7 +332,7 @@
"print(\"Accuracy: \" + str(accuracy))\n",
"\n",
"c_matrix = confusion_matrix(y_test,y_pred)\n",
"ax = sns.heatmap(c_matrix, annot=True, cbar=False, cmap='Blues')\n",
"ax = sns.heatmap(c_matrix, annot=True,fmt='d', cbar=False, cmap='Blues')\n",
"ax.set_xlabel(\"Prediction\")\n",
"ax.set_ylabel(\"Actual\")\n",
"ax.set_title('Confusion Matrix')\n",
Expand All @@ -276,25 +341,39 @@
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Trying with Sally's details\n",
"Fields are in order: distance_from_last_transaction, ratio_to_median_price, used_chip, used_pin_number, online_order "
],
"metadata": {
"collapsed": false
}
"\n",
"Fields are in order: \n",
"* distance_from_last_transaction\n",
"* ratio_to_median_price\n",
"* used_chip \n",
"* used_pin_number\n",
"* online_order "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sally_transaction_details = [[0.3111400080477545, 1.9459399775518593, 1.0, 0.0, 0.0]]\n",
"sally_transaction_details = [\n",
" [0.3111400080477545,\n",
" 1.9459399775518593, \n",
" 1.0, \n",
" 0.0, \n",
" 0.0]\n",
" ]\n",
"\n",
"prediction = sess.run([output_name], {input_name: scaler.transform(sally_transaction_details).astype(np.float32)})\n",
"\n",
"print(\"Was Sally's transaction predicted to be fraudulent? \")\n",
"print(np.squeeze(prediction) > threshold)"
"print(np.squeeze(prediction) > threshold)\n",
"\n",
"print(\"How likely to be fraudulent was it? \")\n",
"print(\"{:.5f}\".format(np.squeeze(prediction)) + \"%\")"
]
}
],
Expand Down
74 changes: 54 additions & 20 deletions 2_save_model.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,21 @@
"To save this model to use from various locations, including other notebooks or serving the model, we need to upload it to s3 compatible storage."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install required packages and define a function for the upload"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install boto3 botocore"
"!pip install boto3==1.26.165 \\\n",
" botocore==1.29.165"
]
},
{
Expand Down Expand Up @@ -53,7 +61,31 @@
" relative_path = os.path.relpath(file_path, local_directory)\n",
" s3_key = os.path.join(s3_prefix, relative_path)\n",
" print(f\"{file_path} -> {s3_key}\")\n",
" bucket.upload_file(file_path, s3_key)"
" bucket.upload_file(file_path, s3_key)\n",
"\n",
"def list_objects(prefix):\n",
" filter = bucket.objects.filter(Prefix=prefix)\n",
" for obj in filter.all():\n",
" print(obj.key)"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"## List files\n",
"\n",
"List in your S3 bucket under the upload prefix `models` to make sure upload was successful. As a best practice, we'll only keep 1 model in a given prefix or directory. There are several models that will need multiple files in directory and we can download and serve the directory with all necessary files this way without mixing up our models.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* If this is the first time running the code, this cell will have no output: \n",
"* But if you've already uploaded your model, you should see: `models/fraud/model.onnx`"
]
},
{
Expand All @@ -62,16 +94,21 @@
"metadata": {},
"outputs": [],
"source": [
"upload_directory_to_s3(\"models\", \"models\")"
"list_objects(\"models\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### List files\n",
"\n",
"List in your S3 bucket under the upload prefix `models` to make sure upload was successful. As a best practice, we'll only keep 1 model in a given prefix or directory. There are several models that will need multiple files in directory and we can download and serve the directory with all necessary files this way without mixing up our models.\n"
"## Upload and check again"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now, we use this function to upload the `models` folder in a rescursive fashion"
]
},
{
Expand All @@ -80,10 +117,16 @@
"metadata": {},
"outputs": [],
"source": [
"def list_objects(prefix):\n",
" filter = bucket.objects.filter(Prefix=prefix)\n",
" for obj in filter.all():\n",
" print(obj.key)"
"upload_directory_to_s3(\"models\", \"models\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"To confirm this worked, we run the `list_objects` function again:"
]
},
{
Expand All @@ -103,17 +146,8 @@
"\n",
"Hopefully, you saw the model `models/fraud/model.onnx` listed above. Now that you've saved the model so s3 storage we can refer to the model using the same data connection to serve the model as an API.\n",
"\n",
"Return to the workshop to deploy the model as an API.\n"
"Return to the workshop instructions to deploy the model as an API.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down

0 comments on commit 82e1176

Please sign in to comment.