Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: modelcard first demo #1268

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

WIP: modelcard first demo #1268

wants to merge 1 commit into from

Conversation

koaning
Copy link
Contributor

@koaning koaning commented Jan 30, 2025

This is just me helping out @glemaitre by independantly having a quick stab at a model card. First draft, but looks alright.

This code:

render_model_card(
    report,
    name="Demo classifier", 
    description="This is really just a demo of a model card. The model is really silly.",
    intended_use="The goal of this pipeline is to show a bunch of properties that can be displayed in a model card.",
    limitations="There is no reason to ever use this system. Ever."
)

Can generate this:

CleanShot 2025-01-30 at 16 48 30

There is an opportunity to also attach training logs, but we should think a bit about how if we want to go there.

Copy link
Contributor

Documentation preview @ c359eb4

@adrinjalali
Copy link
Contributor

adrinjalali commented Jan 31, 2025

This is a nice start, and the general API is certainly okay. However, for implementation, there things we need to consider.

Two points to start with:

  • Model cards are artifacts which will be modified outside our "platform". Therefore it's a common usecase where we'd need to "load" the edited model card for further modifications.
  • Markdown is the format which people find easiest to edit, and therefore most model cards out there are in markdown format.

skops.card model cards allow you to do all of that, and the templates are somewhat jinja templates. We use pandoc to load stored cards into a structured form in the memory, to modify them. I'm not 100% set on the API there, but the basics are pretty nice and we've worked quite a bit (including with Meg Mitchel, the original model card co-author) to get it to that point.

I don't mind adding methods so that we can get an HTML rendered version of the card to show inside a notebook, but usually the markdown is the only source of truth, and platforms render it nicely (GH, HuggingFace, etc), and there's not much of a need to render it inside a notebook. Getting a pdf version of it is more usual that getting the html version.

I agree with @glemaitre that it probably would make sense to have a layer of abstraction in skore to make things easier for the user, yet add a bunch of functionalities we need to skops.card itself.

For reference, an example of dealing with model cards (from an example in skops repo), looks like this:

# %%
# For now, let’s start by creating a new model card and adding a few bits
# of information. We pass our final model as an argument to the
# ``Card`` class, which is used to create a table of
# hyper-parameters and a diagram of the model.


# %%
model_card = card.Card(model=gb_final)
model_card

# %%
# Next let’s add some prose the the model card. We add a short description
# of the model, the intended use, the data, and the preprocessing steps.
# Those are just plain strings, which we add to the card using the
# ``model_card.add`` method. That method takes ``**kwargs`` as
# input, where the key corresponds to the name of the section and the
# value corresponds to the content, i.e. the aforementioned strings. This
# way, we can add multiple new sections with a single method call.

# %%
description = """Gradient boosting regressor trained on California Housing dataset

The model is a gradient boosting regressor from sklearn. On top of the standard
features, it contains predictions from a KNN models. These predictions are calculated
out of fold, then added on top of the existing features. These features are really
helpful for decision tree-based models, since those cannot easily learn from geospatial
data."""
intended_uses = "This model is meant for demonstration purposes"
dataset_description = data.DESCR.split("\n", 1)[1].strip()
preproc_description = (
    "Rows where the target was clipped are excluded. Train/test split is random."
)

model_card.add(
    **{
        "Model description": description,
        "Model description/Dataset description": dataset_description,
        "Model description/Intended uses & limitations": intended_uses,
        "Model Card Authors": "Benjamin Bossan",
        "Model Card Contact": "[email protected]",
    }
)

# %%
# Maybe someone might wonder why we call ``model_card.add(**{…})``
# like this. The reason is the following. Normally, Python
# ``**\ kwargs`` are passed like this: ``foo(key=val)``. But
# we cannot use that syntax here, because the ``key`` would have to
# be a valid variable name. That means it cannot contain any spaces, start
# with a number, etc. But what if our section name contains spaces, like
# ``"Model description"``? We can still pass it as
# ``kwargs``, but we need to put it into a dict first. This is why
# we use the shown notation.

# %%
# By the way, if we wanted to change the content of a section, we could
# just add the same section name again and the value would be overwritten
# by the new content.

# %%
# Another convenience method we should make use of is the
# ``model_card.add_metrics`` method. This will store the metrics
# inside a table for better readability. Again, we pass multiple inputs
# using ``**kwargs``, and the ``description`` is optional.

# %%
model_card.add_metrics(
    description="Metrics are calculated on the test set",
    **{
        "Root mean squared error": -get_scorer("neg_root_mean_squared_error")(
            gb, df_test, y_test
        ),
        "Mean absolute error": -get_scorer("neg_mean_absolute_error")(
            gb, df_test, y_test
        ),
        "R²": get_scorer("r2")(gb, df_test, y_test),
    },
)

# %%
# How about we also add a plot to our model card? For this, let’s use the plot
# that shows the target as a function of longitude and latitude that we created
# above. We will just re-use the code from there to generate the plot. We will
# store it for now inside the same temporary directory as the model, then call
# the ``model_card.add_plot`` method. Since the plot is quite large, let’s
# collapse it in the model card by passing ``folded=True``.

# %%
fig, ax = plt.subplots(figsize=(10, 8))
df.plot(
    kind="scatter",
    x="Longitude",
    y="Latitude",
    c=target_col,
    title="House value by location",
    cmap="coolwarm",
    s=1.5,
    ax=ax,
)
fig.savefig(temp_dir / "geographic.png")
model_card.add_plot(
    folded=True,
    **{
        "Model description/Dataset description/Data distribution": "geographic.png",
    },
)

# %%
# Similar to the getting started code, we make sure that the file name we
# use for adding is just the plain ``"geographic.png"``,
# excluding the temporary directory, or else the file cannot be found
# later on.

# %%
# The model card class also provides a convenient method to add a plot
# that visualizes permutation importances. Let’s use it:

# %%
pi = permutation_importance(
    gb_final, df_test, y_test, scoring="neg_root_mean_squared_error", random_state=0
)
model_card.add_permutation_importances(
    pi, columns=df_test.columns, plot_file="permutation-importances.png", overwrite=True
)

# %%
# For this particular model card, the predefined section
# ``"Citation"`` is not required. Therefore, we delete it
# using ``model_card.delete``. Be careful: If there were subsections
# inside this section, they would be deleted too.


# %%
model_card.delete("Citation")

# %%
# Finally, we save the model card in the temporary directory as
# ``README.md``.

# %%
model_card.save(temp_dir / "README.md")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants