Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training, Test set size and Cross Validation --> #74

Open
github-actions bot opened this issue Oct 23, 2020 · 0 comments
Open

Training, Test set size and Cross Validation --> #74

github-actions bot opened this issue Oct 23, 2020 · 0 comments
Labels

Comments

@github-actions
Copy link

Training, Test set size and Cross Validation -->

Training, Test set size and Cross Validation

<!-- annotate: Training, Test set size and Cross Validation -->

    extension: .md
    format_name: myst
    format_version: 0.12
    jupytext_version: 1.6.0
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

```{raw-cell}
# Class 20: Decision Trees and Cross Validation


1. Share your favorite beverage (or say hi) in the zoom chat
1. log onto prismia
1. Accept assignment 7

Assignment 7

Make a plan with a group:

  • what methods do you need to use in part 1?
  • try to outline with psuedocode what you'll do for part 2 & 3

Share any questions you have.

Followup:

  1. assignment clarified to require 3 values for the parameter in part 2
  2. more tips on finding data sets added to assignment text

+++

Complexity of Decision Trees

# %load http://drsmb.co/310
import pandas as pd
import seaborn as sns
import numpy as np
from sklearn import tree
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
d6_url = 'https://raw.githubusercontent.com/rhodyprog4ds/06-naive-bayes/main/data/dataset6.csv'
df6= pd.read_csv(d6_url,usecols=[1,2,3])
df6.head()
X_train, X_test, y_train,  y_test = train_test_split(df6.values[:,:2],df6.values[:,2],
                                                     train_size=.8)
dt = tree.DecisionTreeClassifier(min_samples_leaf = 10)
dt.fit(X_train,y_train)
print(tree.export_text(dt))
dt2 = tree.DecisionTreeClassifier(min_samples_leaf = 50)
dt2.fit(X_train,y_train)
print(tree.export_text(dt2))
dt2.score(X_test,y_test)
dt.score(X_test,y_test)
df6.shape

Training, Test set size and Cross Validation

dt3 = tree.DecisionTreeClassifier()
dt3.fit(df6.values[:-1,:2],df6.values[:-1,2],)
print(tree.export_text(dt3))
dt4 = tree.DecisionTreeClassifier(max_depth=2)
cv_scores = cross_val_score(dt4,df6.values[:,:2],df6.values[:,2],cv=100 )
cv_scores
np.mean(cv_scores)


3aff1d06465933a7e184d27ad6430b6f39a6eb22
@github-actions github-actions bot added the todo label Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

0 participants