Skip to content

Setup Instructions

Stephen Haddad edited this page May 23, 2024 · 7 revisions

In this tutorial we will be using Jupyter Lab. We will be using conda environments for running the data science and machine learning libraries required for this tutorial. If you are using this material as part of a taught course, it would be best if you can try setting up the necessary environments and downloading data before the start of the tutorial to allow things to run more smoothly.

If you are running this within the Met Office, please see the Met Office specific instructions.

Summary of Key Setup Steps

Warning: Use these steps if you know what you’re doing. This NOT a comprehensive list of commands. See details below if you get stuck.

  • Setup / install conda
  • Clone this repository
  • Create conda envirinments
  • Dowload sample data
  • start jupyter lab
  • try notebooks.

Detailed Information

The following sections expand on the summary list below. If you have any issues

Jupyter Lab

If you are not familiar with Jupyter Lab and Jupyter Notebooks, you should read a short introduction to them and how they work compared to other ways of creating, editing and running python code (e.g. command line, IDE). The following introduction by _ The Carpentries_ is a good starting point:

Overview of Jupyter Notebooks

Conda Environments

We will use Conda to set up the data science and machine learning tools and libraries we will be using. For those unfamiliar with Conda, more more information is available on the Conda docs.

To get started, you need to install conda on your platform. On some platforms (e.g. AWS Sagemaker, AzureML) it will likely already be installed. You can check by trying to run a conda command: conda env list

If conda is not installed, you can install by following the instruction on the Conda Installation web page. You should follow instructions to install ** Miniconda ** on your platform.

If that works, we will try to set up a conda environment. We have defined some environments in the repository for this tutorial, so we will first clone the repo, which is public:

git clone https://github.com/informatics-lab/ml_weather_tutorial.git

Navigate to that directory, and try to set up an environment using the following command:

conda env create –-file environments/requirements_sklearn.yml

If the environment installs successfully, you can then activate the environment with the following command:

conda activate ml_weather_tutorial_skl

Instructions for Met Office

Conda can be quite slow at times to resolve dependencies, especially on the Met Office linux systems. To get around this I have created a “lock file”. This specifies the exact version of each library and so gets around the problems of dependency resolution, but is a lot less portable and so tends to break after a short time or not work for some people. Install the lock file by running this command (on spice):

conda env create -–file environments/req_sklearn_spice_lock.yml

Sample data

The notebooks use sample data that is available at the links below. There are 4 datasets. They should be downloaded into a directory where the notebooks can load them. Create a new directory into which the data should be downloaded. You should then specify the location of the data directory in an environment variable called ${ML_TUTORIAL_DIR}, which the notebooks will look for. This is described in the section on running the notebooks.

  • Rotors example - Zenodo link - Extract files into the root of ${ML_TUTORIAL_DIR}.
    • Additional Rotors data - Google Drive link Extract files into the root of ${ML_TUTORIAL_DIR}.
  • XBT dataset - zenodo link Extract files into the root of ${ML_TUTORIAL_DIR}.
  • Precipitation Rediagnosis - Zenodo link. Create a subdirectory of ${ML_TUTORIAL_DIR} called prd, and extract the contents of the file ml_weather_tutorial_prd.tgz.

Running the notebooks

To run the notebook, you should run the commands below. The environment variable is to tell the notebook where to find the sample data from the previous section.

export ML_TUTORIAL_DIR=/path/to/sample/data
conda activate ml_weather_tutorial_skl
jupyter lab --port 1234

NB /path/to/sample/data/ should be replaced with the actual data path.

Once in jupyter lab, yo can run one of the notebooks, which are the files with the *.ipynb extension.