Skip to content

Latest commit

 

History

History
49 lines (37 loc) · 2.95 KB

File metadata and controls

49 lines (37 loc) · 2.95 KB

Overview

This notebook pipeline downloads a free NOAA weather time series data set archive from the Data Asset Exchange, extracts, cleanses and analyzes the data file. The data file is subsequently used to predict the weather.

This pipeline illustrates the following concepts:

  • Execute notebooks sequentially. Notebook Part 1 - Data Cleaning runs after notebook load_data completed successfully.
  • Execute notebooks in parallel. Notebooks Part 2 - Data Analysis and Part 3 - Time Series Forecasting run in parallel after notebook Part 1 - Data Cleaning completed successfully.
  • Pass input parameters to a notebook. The generic load_data notebook requires an environment variable to be defined that identifies the public dataset download URL.
  • Share data between notebooks. Notebook Part 1 - Data Cleaning generates a data file jfk_weather_cleaned.csv, which is consumed in notebook Part 2 - Data Analysis and Part 3 - Time Series Forecasting.

pipeline snapshot

You can run this pipeline as is locally in JupyterLab or on Kubeflow Pipelines.

Prerequisites

This pipeline requires Elyra v1.2 or later.

Exploring the pipeline

  1. Launch JupyterLab, which has the Elyra extension installed.
  2. Clone the sample repository https://github.com/elyra-ai/examples.git using the Git extension ("Git" > "Clone repository").
  3. If you have access to a Kubeflow Pipelines deployment, [create a runtime environment configuration].(https://elyra.readthedocs.io/en/latest/user_guide/runtime-conf.html)
  4. From the File Browser open analyze_NOAA_weather_data.pipeline, which is located in the pipelines/dax_noaa_weather_data/ directory.
  5. Review the notebook properties (right click > "Properties").
  6. Review the notebooks (right click > "Open file").
  7. Run the pipeline. Two links are displayed.
  8. Open the Kubeflow Pipelines console link in a new browser window. You can monitor the pipeline execution progress by clicking on a node and opening the "Logs" tab. pipeline graph
  9. Open the object storage link in another browser window to download the completed notebooks. object storage