-
Notifications
You must be signed in to change notification settings - Fork 467
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Follow-up on the
run_metadata
changes (#3193)
* Initial commit, nuking all metadata responses and seeing what breaks * Removed last remnant of LazyLoader * Reintroducing the lazy loaders. * Add LazyRunMetadataResponse to EntrypointFunctionDefinition * Test for lazy loaders works now * Fixed tests, reformatted * Use updated template * Auto-update of Starter template * Updated more templates * Fixed failing test * Fixed step run schemas * Auto-update of E2E template * Auto-update of NLP template * Fixed tests, removed additional .value access * Further fixing * Fixed linting issues * Reformatted * Linted, formatted and tested again * Typing * Maybe fix everything * Apply some feedback * new operation * new log_metadata function * changes to the base filters * new filters * adding log_metadata to __all__ * checkpoint with float casting * adding tests * final touches and formatting * formatting * moved the utils * modified log metadata function * checkpoint * deprecating the old functions * linting and final fixes * better error message * fixing the client method * better error message * consistent creation\ * adjusting tests * linting * changes for step metadata * more test adjustments * testing unit tests * linting * fixing more tests * fixing more tests * more test fixes * fixing the test * fixing per comments * added validation, constant error message * linting * new changes * second checkpoint * fixing revisions * adding overlap to remove warnings * complete docs changes * adding a parameter to control the related entity behaviour * fixing the toc * fixed the description * docstring * spellcheck * metadata creation during artifact version creation * allowing artifact metadata with name for external artifact * update the template versions * Auto-update of LLM Finetuning template * Auto-update of Starter template * Auto-update of E2E template * Auto-update of NLP template * fixing the migration script * formatting * redirects * minor fixes * working pipelines again * small fix * working checkpoint * fixes, linting, docstrings * fixing unit tests * docs updates 1 * docs update 2 * fixing integration tests * spellcheck * formatting * Auto-update of E2E template * docs changes * review comments * added the batch rbac call * added a validator to check the name of the keys * small adjustments * base schema added * formatting * new functionalities * breaking circular imports * spellchecker * other minor fixes * covering the uncovered case * adjusting tests * fixing the quickstart again * minor change * going back to publisher step id * updating github refs * Auto-update of LLM Finetuning template * Auto-update of Starter template * fixing tests * updated docs * Auto-update of E2E template * Auto-update of NLP template * formatting * review comments * adding some tests in * review comments * Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py Co-authored-by: Michael Schuster <[email protected]> * Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py Co-authored-by: Michael Schuster <[email protected]> * Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py Co-authored-by: Michael Schuster <[email protected]> * Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py Co-authored-by: Michael Schuster <[email protected]> * Update src/zenml/zen_stores/migrations/versions/cc269488e5a9_separate_run_metadata.py Co-authored-by: Michael Schuster <[email protected]> * changed assert to value error * fixed the alembic head * changed the interaction with the models * trimmed down * small bugfix * naming recommendations * linting * fixing the test --------- Co-authored-by: AlexejPenner <[email protected]> Co-authored-by: Andrei Vishniakov <[email protected]> Co-authored-by: GitHub Actions <[email protected]> Co-authored-by: Michael Schuster <[email protected]> Co-authored-by: Michael Schuster <[email protected]>
- Loading branch information
1 parent
0ccb1fd
commit fbbfc29
Showing
57 changed files
with
1,482 additions
and
566 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
71 changes: 51 additions & 20 deletions
71
...o/model-management-metrics/track-metrics-metadata/attach-metadata-to-a-model.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,62 +1,93 @@ | ||
--- | ||
description: >- | ||
Attach any metadata as key-value pairs to your models for future reference and | ||
auditability. | ||
description: Learn how to attach metadata to a model. | ||
--- | ||
|
||
# Attach metadata to a model | ||
|
||
ZenML allows you to log metadata for models, which provides additional context | ||
that goes beyond individual artifact details. Model metadata can represent | ||
high-level insights, such as evaluation results, deployment information, | ||
or customer-specific details, making it easier to manage and interpret | ||
the model's usage and performance across different versions. | ||
|
||
## Logging Metadata for Models | ||
|
||
While artifact metadata is specific to individual outputs of steps, model metadata encapsulates broader and more general information that spans across multiple artifacts. For example, evaluation results or the name of a customer for whom the model is intended could be logged with the model. | ||
To log metadata for a model, use the `log_metadata` function. This function | ||
lets you attach key-value metadata to a model, which can include metrics and | ||
other JSON-serializable values, such as custom ZenML types like `Uri`, | ||
`Path`, and `StorageSize`. | ||
|
||
Here's an example of logging metadata for a model: | ||
|
||
```python | ||
from zenml import step, log_model_metadata, ArtifactConfig, get_step_context | ||
from typing import Annotated | ||
|
||
import pandas as pd | ||
from sklearn.ensemble import RandomForestClassifier | ||
from sklearn.base import ClassifierMixin | ||
from sklearn.ensemble import RandomForestClassifier | ||
|
||
from zenml import step, log_metadata, ArtifactConfig, get_step_context | ||
|
||
|
||
@step | ||
def train_model(dataset: pd.DataFrame) -> Annotated[ClassifierMixin, ArtifactConfig(name="sklearn_classifier")]: | ||
"""Train a model""" | ||
# Fit the model and compute metrics | ||
def train_model(dataset: pd.DataFrame) -> Annotated[ | ||
ClassifierMixin, ArtifactConfig(name="sklearn_classifier") | ||
]: | ||
"""Train a model and log model metadata.""" | ||
classifier = RandomForestClassifier().fit(dataset) | ||
accuracy, precision, recall = ... | ||
|
||
# Log metadata for the model | ||
# This associates the metadata with the ZenML model, not the artifact | ||
log_model_metadata( | ||
|
||
log_metadata( | ||
metadata={ | ||
"evaluation_metrics": { | ||
"accuracy": accuracy, | ||
"precision": precision, | ||
"recall": recall | ||
} | ||
}, | ||
# Omitted model_name will use the model in the current context | ||
model_name="zenml_model_name", | ||
# Omitted model_version will default to 'latest' | ||
model_version="zenml_model_version", | ||
infer_model=True, | ||
) | ||
|
||
return classifier | ||
``` | ||
|
||
In this example, the metadata is associated with the model rather than the specific classifier artifact. This is particularly useful when the metadata reflects an aggregation or summary of various steps and artifacts in the pipeline. | ||
In this example, the metadata is associated with the model rather than the | ||
specific classifier artifact. This is particularly useful when the metadata | ||
reflects an aggregation or summary of various steps and artifacts in the | ||
pipeline. | ||
|
||
|
||
### Selecting Models with `log_metadata` | ||
|
||
When using `log_metadata`, ZenML provides flexible options of attaching | ||
metadata to model versions: | ||
|
||
1. **Using `infer_model`**: If used within a step, ZenML will use the step | ||
context to infer the model it is using and attach the metadata to it. | ||
2. **Model Name and Version Provided**: If both a model name and version are | ||
provided, ZenML will use these to identify and attach metadata to the | ||
specific model version. | ||
3. **Model Version ID Provided**: If a model version ID is directly provided, | ||
ZenML will use it to fetch and attach the metadata to that specific model | ||
version. | ||
|
||
## Fetching logged metadata | ||
|
||
Once metadata has been logged in an [artifact](attach-metadata-to-an-artifact.md), model, or [step](attach-metadata-to-steps.md), we can easily fetch the metadata with the ZenML Client: | ||
Once metadata has been attached to a model, it can be retrieved for inspection | ||
or analysis using the ZenML Client. | ||
|
||
```python | ||
from zenml.client import Client | ||
|
||
client = Client() | ||
model = client.get_model_version("my_model", "my_version") | ||
|
||
print(model.run_metadata["metadata_key"].value) | ||
print(model.run_metadata["metadata_key"]) | ||
``` | ||
|
||
{% hint style="info" %} | ||
When you are fetching metadata using a specific key, the returned value will | ||
always reflect the latest entry. | ||
{% endhint %} | ||
|
||
<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure> |
87 changes: 87 additions & 0 deletions
87
...-to/model-management-metrics/track-metrics-metadata/attach-metadata-to-a-run.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
--- | ||
description: Learn how to attach metadata to a run. | ||
--- | ||
|
||
# Attach Metadata to a Run | ||
|
||
In ZenML, you can log metadata directly to a pipeline run, either during or | ||
after execution, using the `log_metadata` function. This function allows you | ||
to attach a dictionary of key-value pairs as metadata to a pipeline run, | ||
with values that can be any JSON-serializable data type, including ZenML | ||
custom types like `Uri`, `Path`, `DType`, and `StorageSize`. | ||
|
||
## Logging Metadata Within a Run | ||
|
||
If you are logging metadata from within a step that’s part of a pipeline run, | ||
calling `log_metadata` will attach the specified metadata to the current | ||
pipeline run where the metadata key will have the `step_name::metadata_key` | ||
pattern. This allows you to use the same metadata key from different steps | ||
while the run's still executing. | ||
|
||
```python | ||
from typing import Annotated | ||
|
||
import pandas as pd | ||
from sklearn.base import ClassifierMixin | ||
from sklearn.ensemble import RandomForestClassifier | ||
|
||
from zenml import step, log_metadata, ArtifactConfig | ||
|
||
|
||
@step | ||
def train_model(dataset: pd.DataFrame) -> Annotated[ | ||
ClassifierMixin, | ||
ArtifactConfig(name="sklearn_classifier", is_model_artifact=True) | ||
]: | ||
"""Train a model and log run-level metadata.""" | ||
classifier = RandomForestClassifier().fit(dataset) | ||
accuracy, precision, recall = ... | ||
|
||
# Log metadata at the run level | ||
log_metadata( | ||
metadata={ | ||
"run_metrics": { | ||
"accuracy": accuracy, | ||
"precision": precision, | ||
"recall": recall | ||
} | ||
} | ||
) | ||
return classifier | ||
``` | ||
|
||
## Manually Logging Metadata to a Pipeline Run | ||
|
||
You can also attach metadata to a specific pipeline run without needing a step, | ||
using identifiers like the run ID. This is useful when logging information or | ||
metrics that were calculated post-execution. | ||
|
||
```python | ||
from zenml import log_metadata | ||
|
||
log_metadata( | ||
metadata={"post_run_info": {"some_metric": 5.0}}, | ||
run_id_name_or_prefix="run_id_name_or_prefix" | ||
) | ||
``` | ||
|
||
## Fetching Logged Metadata | ||
|
||
Once metadata has been logged in a pipeline run, you can retrieve it using | ||
the ZenML Client: | ||
|
||
```python | ||
from zenml.client import Client | ||
|
||
client = Client() | ||
run = client.get_pipeline_run("run_id_name_or_prefix") | ||
|
||
print(run.run_metadata["metadata_key"]) | ||
``` | ||
|
||
{% hint style="info" %} | ||
When you are fetching metadata using a specific key, the returned value will | ||
always reflect the latest entry. | ||
{% endhint %} | ||
|
||
<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure> |
Oops, something went wrong.