-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AzureMLPipelineDataSet not compatible with pipeline_ml_factory method from kedro-mlflow #53
Comments
If adding |
Added
It's already released in 0.4.0 . @jpoullet2000 please let me know if it fixes the problem.
|
currently out of office. I'll come back to you in 2 weeks. |
Sorry for the late reply, I was on holidays too. Just to understand, what does this dataset is intended to do? Actually, kedro-mlflow should only check the filepath for the datasets it needs to use as artifacts for mlflow. So either this is a bug (kedro-mlflow does check the filepath on a dataset it should not) or this solution won't work (kedro-mlflow won't complain, but if there is no data at the given filepath, it will not be able to log it in mlflow nor to fetch it at inference time). What does your pipeline look like? What are you trying to do? |
Hi. Sorry for the late reply. The goal is to store a mlflow pipeline while running a azureml pipeline wrapping a kedro pipeline. I'd like to use the |
It's specific to
As for that - we indeed split the kedro nodes into Azure ML nodes, but I don't understand the "are not shared between training and inference". Data is shared via Kedro's Data Catalog, so when any node needs to load something, it goes to the Data Catalog. While running on Azure ML, if the entry is missing from the catalog, our plugin automatically loads the data from the temporary storage set in kedro-azureml/kedro_azureml/config.py Line 95 in a040b3c
If you've opted in the preview feature pipeline_data_passing , then the data will be passed via Azure ML-mounted files.
Maybe it's a problem in kedro-mlflow, that it cannot recognize that the data is passed implicitly. Have you tried explicitly defining your inputs/outputs (e.g. |
Hum, I'll have a deep dive in the code in the coming days, but I already have some comments:
|
PIpeline Inference Model contains both the scaler and RF model and is generated by the |
The
pipeline_ml_factory
method in kedro-mlflow is a useful method to store artifacts (transformers, models) automatically (using kedro-mlflow hook). However, this method calls the method extract_pipeline_artifacts which requires the_filepath
attribute to be available (see here).AzureMLPipelineDataSet
class does not provide this attribute.Wouldn't it be possible to add it to the class attributes?
Do you have any other suggestion to store the Mlflow Pipeline ?
The text was updated successfully, but these errors were encountered: