-
Notifications
You must be signed in to change notification settings - Fork 4
Setup Instructions
In this tutorial we will be using Jupyter Lab. We will be using conda environments for running the data science and machine learning libraries required for this tutorial. If you are using this material as part of a taught course, it would be best if you can try setting up the necessary environments and downloading data before the start of the tutorial to allow things to run more smoothly.
If you are running this within the Met Office, please see the Met Office specific instructions.
Warning: Use these steps if you know what you’re doing. This NOT a comprehensive list of commands. See details below if you get stuck.
- Setup / install conda
- Clone this repository
- Create conda envirinments
- Dowload sample data
- start jupyter lab
- try notebooks.
The following sections expand on the summary list below. If you have any issues
If you are not familiar with Jupyter Lab and Jupyter Notebooks, you should read a short introduction to them and how they work compared to other ways of creating, editing and running python code (e.g. command line, IDE). The following introduction by _ The Carpentries_ is a good starting point:
We will use Conda to set up the data science and machine learning tools and libraries we will be using. For those unfamiliar with Conda, more more information is available on the Conda docs.
To get started, you need to install conda on your platform. On some platforms (e.g. AWS Sagemaker, AzureML) it will likely already be installed. You can check by trying to run a conda command:
conda env list
If conda is not installed, you can install by following the instruction on the Conda Installation web page. You should follow instructions to install ** Miniconda ** on your platform.
If that works, we will try to set up a conda environment. We have defined some environments in the repository for this tutorial, so we will first clone the repo, which is public:
git clone https://github.com/informatics-lab/ml_weather_tutorial.git
Navigate to that directory, and try to set up an environment using the following command:
conda env create –-file environments/requirements_sklearn.yml
If the environment installs successfully, you can then activate the environment with the following command:
conda activate ml_weather_tutorial_skl
Conda can be quite slow at times to resolve dependencies, especially on the Met Office linux systems. To get around this I have created a “lock file”. This specifies the exact version of each library and so gets around the problems of dependency resolution, but is a lot less portable and so tends to break after a short time or not work for some people. Install the lock file by running this command (on spice):
conda env create -–file environments/req_sklearn_spice_lock.yml
The notebooks use sample data that is available at the links below. There are 4 datasets. They should be downloaded into a directory where the notebooks can load them. Create a new directory into which the data should be downloaded. You should then specify the location of the data directory in an environment variable called ${ML_TUTORIAL_DIR}
, which the notebooks will look for. This is described in the section on running the notebooks.
- Rotors example - Zenodo link - Extract files into the root of
${ML_TUTORIAL_DIR}
.- Additional Rotors data - Google Drive link Extract files into the root of
${ML_TUTORIAL_DIR}
.
- Additional Rotors data - Google Drive link Extract files into the root of
- XBT dataset - zenodo link Extract files into the root of
${ML_TUTORIAL_DIR}
. - Precipitation Rediagnosis - Zenodo link. Create a subdirectory of
${ML_TUTORIAL_DIR}
calledprd
, and extract the contents of the fileml_weather_tutorial_prd.tgz
.
To run the notebook, you should run the commands below. The environment variable is to tell the notebook where to find the sample data from the previous section.
export ML_TUTORIAL_DIR=/path/to/sample/data
conda activate ml_weather_tutorial_skl
jupyter lab --port 1234
NB /path/to/sample/data/
should be replaced with the actual data path.
Once in jupyter lab, yo can run one of the notebooks, which are the files with the *.ipynb
extension.