This template is licensed under Apache 2.0 and contains the following open source components:
In this project we train a predictive model on Supervisory Control and Data Acquisition (SCADA) data captured from a physical wind turbine. SCADA systems are used for controlling, monitoring, and analyzing industrial devices and processes. The SCADA concept was developed to be a universal means of remote-access to a variety of local control modules, which could be from different manufacturers and allowing access through standard automation protocols.
Here we demonstrate how we can train a machine learning model using a freely available SCADA dataset, which comes from Kaggle
The samples in this dataset are distributed as a .CSV file with the following attributes:
- Date/Time --- timestamp of the observation (10 minutes intervals)
- LV ActivePower (kW) --- The amount of power generated by the turbine at that timestamp (in kWh)
- Wind Speed (m/s) --- The wind speed as measured at the hub height of the turbine
- Theoretical_Power_Curve (KWh) --- The theoretical power values that the turbine generates with that wind speed as provided by the turbine manufacturer
- Wind Direction (degrees) --- The wind direction at the hub height of the turbine (the turbine turns in this direction automaticaly)
This project contains the following assets
WindTurbineScada.ipynb
--- a notebok demonstrating data ingestion, exploratory data analysis, model building and evaluationtrain.py
--- a model training script, which can be run as a Domino job to retrain the model (i.e. if new data is available)score.py
--- a scoring function, which can be deployed as a Domino Model APImodel.bin
--- a pickled version of a pre-trainedExtraTreesRegressor
modeldata/T1.csv
--- the original dataset
This project works with a standard small-sized hardware tier, such as the small-k8s tier on all Domino deployments.
This project can be run with a Domino Standard Compute Environment that has Python 3.9 or above.