-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add video of HTML representation of Pipeline
#465
Comments
The concept is a video running the following lines of code "life" and dynamically show the info displayed in the resulting diagram import pandas as pd
ames_housing = pd.read_csv("../datasets/house_prices.csv", na_values='?')
target_name = "SalePrice"
data, target = ames_housing.drop(columns=target_name), ames_housing[target_name]
target = (target > 200_000).astype(int) data.head() numeric_features = ['LotArea', 'FullBath', 'HalfBath']
categorical_features = ['Neighborhood','HouseStyle']
data = data[numeric_features + categorical_features] from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler(),
)])
categorical_transformer = OneHotEncoder(handle_unknown='ignore') from sklearn.compose import ColumnTransformer
preprocessor = ColumnTransformer(transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features),
]) from sklearn.linear_model import LogisticRegression
model = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', LogisticRegression()),
]) from sklearn import set_config
set_config(display='diagram')
model from sklearn.model_selection import cross_validate
cv_results = cross_validate(model, data, target, cv=5)
cv_results The use of the SimpleImputer is optional and is introduced here only to illustrate a slightly complex pipeline. |
This example could serve as a baseline for addressing #435 and eventually #414. new_categorical_features = categorical_features.remove('Neighborhood')
new_numerical_features = ['Latitude','Longitud']
preprocessor = ColumnTransformer(transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, new_categorical_features)
])
model = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegression())]) from sklearn.neighbors import KNeighborsClassifier
new_preprocessor = ColumnTransformer(transformers=[
('lat_lon', numeric_transformer, new_numerical_features)
])
neighbors_model = Pipeline(steps=[('new_preprocessor', new_preprocessor),
])
from sklearn.ensemble import StackingClassifier
estimators = [('logistic', model),
('KNN', neighbors_model)
]
clf = StackingClassifier(
estimators=estimators, final_estimator= KNeighborsClassifier()
)
clf It uses |
Thanks @ArturoAmorQ. Some notes on the recording you sent us:
|
|
Some rephrasing suggestions:
or more precisely:
Miscellaneous remarks:
https://www.google.com/search?q=how+to+pronounce+column I think this is not a problem as most people will understand anyway.
|
Also: "Hello! Today we are gonna talk about pipelines." => "Hello! In this video I will introduce pipelines." |
I am just wondering if saying "In this video I will introduce pipelines" is not something that may depend on where we place the video (i.e. before or after the first notebook mentioning it)? Probably your first suggestion "present pipelines" is more adequate in this case. |
But appart from that, thanks @GaelVaroquaux and @ogrisel for your time and feedback! I will take all of the above into account. |
I thought the same but I don't think it matters. I think it's fine even if we put the video after we notebook that introduce the pipeline using written text. As you wish. My main remark was more about avoiding "Today" and also trying to avoid "gonna" which I do not find it pretty even I tend to use it a lot myself as well. |
In the video I say "going to", as I agree that "gonna" does not sound very professional. |
Indeed, I don't know why I wrongly memorized you said "gonna". I cannot trust my own ears... |
I recently learned that
sklearn.set_config
optiondisplay='diagram'
is actually interactive and apparently I was not the only one surprised by this. A short video could show how to manipulate these HTML diagrams while motivating the use of a pipeline for assembling complex models.The text was updated successfully, but these errors were encountered: