-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: modelcard first demo #1268
base: main
Are you sure you want to change the base?
Conversation
This is a nice start, and the general API is certainly okay. However, for implementation, there things we need to consider. Two points to start with:
I don't mind adding methods so that we can get an HTML rendered version of the card to show inside a notebook, but usually the markdown is the only source of truth, and platforms render it nicely (GH, HuggingFace, etc), and there's not much of a need to render it inside a notebook. Getting a pdf version of it is more usual that getting the html version. I agree with @glemaitre that it probably would make sense to have a layer of abstraction in For reference, an example of dealing with model cards (from an example in skops repo), looks like this: # %%
# For now, let’s start by creating a new model card and adding a few bits
# of information. We pass our final model as an argument to the
# ``Card`` class, which is used to create a table of
# hyper-parameters and a diagram of the model.
# %%
model_card = card.Card(model=gb_final)
model_card
# %%
# Next let’s add some prose the the model card. We add a short description
# of the model, the intended use, the data, and the preprocessing steps.
# Those are just plain strings, which we add to the card using the
# ``model_card.add`` method. That method takes ``**kwargs`` as
# input, where the key corresponds to the name of the section and the
# value corresponds to the content, i.e. the aforementioned strings. This
# way, we can add multiple new sections with a single method call.
# %%
description = """Gradient boosting regressor trained on California Housing dataset
The model is a gradient boosting regressor from sklearn. On top of the standard
features, it contains predictions from a KNN models. These predictions are calculated
out of fold, then added on top of the existing features. These features are really
helpful for decision tree-based models, since those cannot easily learn from geospatial
data."""
intended_uses = "This model is meant for demonstration purposes"
dataset_description = data.DESCR.split("\n", 1)[1].strip()
preproc_description = (
"Rows where the target was clipped are excluded. Train/test split is random."
)
model_card.add(
**{
"Model description": description,
"Model description/Dataset description": dataset_description,
"Model description/Intended uses & limitations": intended_uses,
"Model Card Authors": "Benjamin Bossan",
"Model Card Contact": "[email protected]",
}
)
# %%
# Maybe someone might wonder why we call ``model_card.add(**{…})``
# like this. The reason is the following. Normally, Python
# ``**\ kwargs`` are passed like this: ``foo(key=val)``. But
# we cannot use that syntax here, because the ``key`` would have to
# be a valid variable name. That means it cannot contain any spaces, start
# with a number, etc. But what if our section name contains spaces, like
# ``"Model description"``? We can still pass it as
# ``kwargs``, but we need to put it into a dict first. This is why
# we use the shown notation.
# %%
# By the way, if we wanted to change the content of a section, we could
# just add the same section name again and the value would be overwritten
# by the new content.
# %%
# Another convenience method we should make use of is the
# ``model_card.add_metrics`` method. This will store the metrics
# inside a table for better readability. Again, we pass multiple inputs
# using ``**kwargs``, and the ``description`` is optional.
# %%
model_card.add_metrics(
description="Metrics are calculated on the test set",
**{
"Root mean squared error": -get_scorer("neg_root_mean_squared_error")(
gb, df_test, y_test
),
"Mean absolute error": -get_scorer("neg_mean_absolute_error")(
gb, df_test, y_test
),
"R²": get_scorer("r2")(gb, df_test, y_test),
},
)
# %%
# How about we also add a plot to our model card? For this, let’s use the plot
# that shows the target as a function of longitude and latitude that we created
# above. We will just re-use the code from there to generate the plot. We will
# store it for now inside the same temporary directory as the model, then call
# the ``model_card.add_plot`` method. Since the plot is quite large, let’s
# collapse it in the model card by passing ``folded=True``.
# %%
fig, ax = plt.subplots(figsize=(10, 8))
df.plot(
kind="scatter",
x="Longitude",
y="Latitude",
c=target_col,
title="House value by location",
cmap="coolwarm",
s=1.5,
ax=ax,
)
fig.savefig(temp_dir / "geographic.png")
model_card.add_plot(
folded=True,
**{
"Model description/Dataset description/Data distribution": "geographic.png",
},
)
# %%
# Similar to the getting started code, we make sure that the file name we
# use for adding is just the plain ``"geographic.png"``,
# excluding the temporary directory, or else the file cannot be found
# later on.
# %%
# The model card class also provides a convenient method to add a plot
# that visualizes permutation importances. Let’s use it:
# %%
pi = permutation_importance(
gb_final, df_test, y_test, scoring="neg_root_mean_squared_error", random_state=0
)
model_card.add_permutation_importances(
pi, columns=df_test.columns, plot_file="permutation-importances.png", overwrite=True
)
# %%
# For this particular model card, the predefined section
# ``"Citation"`` is not required. Therefore, we delete it
# using ``model_card.delete``. Be careful: If there were subsections
# inside this section, they would be deleted too.
# %%
model_card.delete("Citation")
# %%
# Finally, we save the model card in the temporary directory as
# ``README.md``.
# %%
model_card.save(temp_dir / "README.md") |
This is just me helping out @glemaitre by independantly having a quick stab at a model card. First draft, but looks alright.
This code:
Can generate this:
There is an opportunity to also attach training logs, but we should think a bit about how if we want to go there.