Skip to content

Commit

Permalink
Completed assignment #1
Browse files Browse the repository at this point in the history
  • Loading branch information
Ryan Yeh committed Sep 30, 2024
1 parent 5476b71 commit 2731dbe
Showing 1 changed file with 52 additions and 14 deletions.
66 changes: 52 additions & 14 deletions 02_activities/assignments/assignment_1.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 23,
"id": "4a3485d6-ba58-4660-a983-5680821c5719",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -96,7 +96,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Your answer here"
"# Your answer here\n",
"wine_df.shape[0]"
]
},
{
Expand All @@ -114,7 +115,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Your answer here"
"# Your answer here\n",
"wine_df.shape[1]"
]
},
{
Expand All @@ -132,7 +134,9 @@
"metadata": {},
"outputs": [],
"source": [
"# Your answer here"
"# Your answer here\n",
"print(f\"'class' type is: {wine_df['class'].dtypes}\")\n",
"print(f\"'levels' of 'class': {set(wine_df['class'])}\")"
]
},
{
Expand All @@ -151,7 +155,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Your answer here"
"# Your answer here\n",
"print(f\"number of predictor variables: {wine_df.shape[1]-1}\")"
]
},
{
Expand Down Expand Up @@ -204,7 +209,8 @@
"id": "403ef0bb",
"metadata": {},
"source": [
"> Your answer here..."
"> Your answer here...\n",
"Predictor variables need to be standarized because variables in different scale will impact the model fitting differently. By standardizing all predictor variables, it makes all variable to have the same amount of impact to the model."
]
},
{
Expand All @@ -217,10 +223,11 @@
},
{
"cell_type": "markdown",
"id": "fdee5a15",
"id": "7628bd1b",
"metadata": {},
"source": [
"> Your answer here..."
"> Your answer here...\n",
"The 'Class' is the outcome of the prediction and there's no need to standardize."
]
},
{
Expand All @@ -233,10 +240,11 @@
},
{
"cell_type": "markdown",
"id": "f0676c21",
"id": "ae8447fe",
"metadata": {},
"source": [
"> Your answer here..."
"> Your answer here...\n",
"This allows repeatability for the training and testing. The particular seed value is not important because all we care about is repeatibility between our training and testing."
]
},
{
Expand All @@ -251,15 +259,17 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 42,
"id": "72c101f2",
"metadata": {},
"outputs": [],
"source": [
"# Do not touch\n",
"np.random.seed(123)\n",
"# Create a random vector of True and False values to split the data\n",
"split = np.random.choice([True, False], size=len(predictors_standardized), replace=True, p=[0.75, 0.25])"
"split = np.random.choice([True, False], size=len(predictors_standardized), replace=True, p=[0.75, 0.25])\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(predictors_standardized, wine_df['class'], test_size=0.25)"
]
},
{
Expand Down Expand Up @@ -287,7 +297,21 @@
"metadata": {},
"outputs": [],
"source": [
"# Your code here..."
"# Your code here...\n",
"knn = KNeighborsClassifier()\n",
"\n",
"parameter_grid = {\n",
" \"n_neighbors\": range(1, 51) # n_neighbors between 1 to 50\n",
"}\n",
"tune_grid = GridSearchCV(\n",
" estimator=knn, # knn\n",
" param_grid=parameter_grid, # see above\n",
" cv=10 # 10-fold cross-validation\n",
")\n",
"\n",
"# Grid search using training data\n",
"tune_grid.fit(X_train, y_train)\n",
"print(f\"Best value for n_neighbors is {tune_grid.best_params_['n_neighbors']}\")"
]
},
{
Expand All @@ -308,7 +332,21 @@
"metadata": {},
"outputs": [],
"source": [
"# Your code here..."
"# Your code here...\n",
"\n",
"# Initialize KNN with best N-value from the previous grid search\n",
"knn = KNeighborsClassifier(n_neighbors=tune_grid.best_params_['n_neighbors'])\n",
"\n",
"# Train the model\n",
"knn.fit(X_train, y_train)\n",
"\n",
"# Predict with test data\n",
"prediction = knn.predict(X_test)\n",
"\n",
"# Get the accuracy score\n",
"accuracy = accuracy_score(y_test, prediction)\n",
"\n",
"print(f\"Accuracy of the model on test data set is: {accuracy}\")\n"
]
},
{
Expand Down

0 comments on commit 2731dbe

Please sign in to comment.