elyra-examples/pipelines/dax_noaa_weather_data at main · fresende/elyra-examples

Name		Name	Last commit message	Last commit date
parent directory ..
doc/images		doc/images
Part 1 - Data Cleaning.ipynb		Part 1 - Data Cleaning.ipynb
Part 2 - Data Analysis.ipynb		Part 2 - Data Analysis.ipynb
Part 3 - Time Series Forecasting.ipynb		Part 3 - Time Series Forecasting.ipynb
README.md		README.md
analyze_NOAA_weather_data.pipeline		analyze_NOAA_weather_data.pipeline
load_data.ipynb		load_data.ipynb

README.md

This notebook pipeline downloads a free NOAA weather time series data set archive from the Data Asset Exchange, extracts, cleanses and analyzes the data file. The data file is subsequently used to predict the weather.

This pipeline illustrates the following concepts:

Execute notebooks sequentially. Notebook Part 1 - Data Cleaning runs after notebook load_data completed successfully.
Execute notebooks in parallel. Notebooks Part 2 - Data Analysis and Part 3 - Time Series Forecasting run in parallel after notebook Part 1 - Data Cleaning completed successfully.
Pass input parameters to a notebook. The generic load_data notebook requires an environment variable to be defined that identifies the public dataset download URL.
Share data between notebooks. Notebook Part 1 - Data Cleaning generates a data file jfk_weather_cleaned.csv, which is consumed in notebook Part 2 - Data Analysis and Part 3 - Time Series Forecasting.

You can run this pipeline as is locally in JupyterLab or on Kubeflow Pipelines.

This pipeline requires Elyra v1.2 or later.

Launch JupyterLab, which has the Elyra extension installed.
Clone the sample repository https://github.com/elyra-ai/examples.git using the Git extension ("Git" > "Clone repository").
If you have access to a Kubeflow Pipelines deployment, [create a runtime environment configuration].(https://elyra.readthedocs.io/en/latest/user_guide/runtime-conf.html)
From the File Browser open analyze_NOAA_weather_data.pipeline, which is located in the pipelines/dax_noaa_weather_data/ directory.
Review the notebook properties (right click > "Properties").
Review the notebooks (right click > "Open file").
Run the pipeline. Two links are displayed.
Open the Kubeflow Pipelines console link in a new browser window. You can monitor the pipeline execution progress by clicking on a node and opening the "Logs" tab.
Open the object storage link in another browser window to download the completed notebooks.