note on numpy array indexing

rhodyprog4ds · Oct 23, 2020 · aad41cc · aad41cc
1 parent 6eba0a3
commit aad41cc
Showing 1 changed file with 12 additions and 5 deletions.
diff --git a/notes/2020-10-23.md b/notes/2020-10-23.md
@@ -20,22 +20,22 @@ kernelspec:
 1. Accept assignment 7
 ```
 
-<!-- annotate: Assignment 7  --> 
-## Assignment 7 
+<!-- annotate: Assignment 7  -->
+## Assignment 7
 
 Make a plan with a group:
 - what methods do you need to use in part 1?
 - try to outline with psuedocode what you'll do for part 2 & 3
 
-Share any questions you have. 
+Share any questions you have.
 
 Followup:
 1. assignment clarified to require 3 values for the parameter in part 2
 1. more tips on finding data sets added to assignment text
 
 +++
 
-<!-- annotate: Complexity of Decision Trees --> 
+<!-- annotate: Complexity of Decision Trees -->
 ## Complexity of Decision Trees
 
 ```{code-cell} ipython3
@@ -54,6 +54,13 @@ df6= pd.read_csv(d6_url,usecols=[1,2,3])
 df6.head()
 ```
 
+````{margin}
+```{note}
+`df6.values` is a numpy array, which is a good datastructure for storing matrices of data.  We can index into numpy arrays using `[rows, columns]`.  Here, `df6.values[:,:2]` we take all the rows (`:`) and the columns up to, but not including index 2 for the features (X) `:2` and use columns at index 2 for the target(y). 
+```
+````
+
+
 ```{code-cell} ipython3
 X_train, X_test, y_train,  y_test = train_test_split(df6.values[:,:2],df6.values[:,2],
                                                      train_size=.8)
@@ -89,7 +96,7 @@ dt.score(X_test,y_test)
 df6.shape
 ```
 
-<!-- annotate: Training, Test set size and Cross Validation --> 
+<!-- annotate: Training, Test set size and Cross Validation -->
 ## Training, Test set size and Cross Validation
 
 ```{code-cell} ipython3