This notebook pipeline downloads a free NOAA weather time series data set archive from the Data Asset Exchange, extracts, cleanses and analyzes the data file. The data file is subsequently used to predict the weather.
This pipeline illustrates the following concepts:
- Execute notebooks sequentially. Notebook
Part 1 - Data Cleaning
runs after notebookload_data
completed successfully. - Execute notebooks in parallel. Notebooks
Part 2 - Data Analysis
andPart 3 - Time Series Forecasting
run in parallel after notebookPart 1 - Data Cleaning
completed successfully. - Pass input parameters to a notebook. The generic
load_data
notebook requires an environment variable to be defined that identifies the public dataset download URL. - Share data between notebooks. Notebook
Part 1 - Data Cleaning
generates a data filejfk_weather_cleaned.csv
, which is consumed in notebookPart 2 - Data Analysis
andPart 3 - Time Series Forecasting
.
You can run this pipeline as is locally in JupyterLab or on Kubeflow Pipelines.
This pipeline requires Elyra v1.2 or later.
- Launch JupyterLab, which has the Elyra extension installed.
- Clone the sample repository
https://github.com/elyra-ai/examples.git
using the Git extension ("Git" > "Clone repository"). - If you have access to a Kubeflow Pipelines deployment, [create a runtime environment configuration].(https://elyra.readthedocs.io/en/latest/user_guide/runtime-conf.html)
- From the File Browser open
analyze_NOAA_weather_data.pipeline
, which is located in thepipelines/dax_noaa_weather_data/
directory. - Review the notebook properties (right click > "Properties").
- Review the notebooks (right click > "Open file").
- Run the pipeline. Two links are displayed.
- Open the Kubeflow Pipelines console link in a new browser window. You can monitor the pipeline execution progress by clicking on a node and opening the "Logs" tab.
- Open the object storage link in another browser window to download the completed notebooks.