From aad41cc4e252f7b35afc24af7d6734fdc4cd53e6 Mon Sep 17 00:00:00 2001 From: Sarah M Brown Date: Fri, 23 Oct 2020 15:15:55 -0400 Subject: [PATCH] note on numpy array indexing --- notes/2020-10-23.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/notes/2020-10-23.md b/notes/2020-10-23.md index 97517efd..acee74d7 100644 --- a/notes/2020-10-23.md +++ b/notes/2020-10-23.md @@ -20,14 +20,14 @@ kernelspec: 1. Accept assignment 7 ``` - -## Assignment 7 + +## Assignment 7 Make a plan with a group: - what methods do you need to use in part 1? - try to outline with psuedocode what you'll do for part 2 & 3 -Share any questions you have. +Share any questions you have. Followup: 1. assignment clarified to require 3 values for the parameter in part 2 @@ -35,7 +35,7 @@ Followup: +++ - + ## Complexity of Decision Trees ```{code-cell} ipython3 @@ -54,6 +54,13 @@ df6= pd.read_csv(d6_url,usecols=[1,2,3]) df6.head() ``` +````{margin} +```{note} +`df6.values` is a numpy array, which is a good datastructure for storing matrices of data. We can index into numpy arrays using `[rows, columns]`. Here, `df6.values[:,:2]` we take all the rows (`:`) and the columns up to, but not including index 2 for the features (X) `:2` and use columns at index 2 for the target(y). +``` +```` + + ```{code-cell} ipython3 X_train, X_test, y_train, y_test = train_test_split(df6.values[:,:2],df6.values[:,2], train_size=.8) @@ -89,7 +96,7 @@ dt.score(X_test,y_test) df6.shape ``` - + ## Training, Test set size and Cross Validation ```{code-cell} ipython3