Modified from M. Pierini's tutorial for the INFN School of Statistics 2022.
The notebooks can be easily run in Colab following instructions in these slides. Alternatively, one can run locally on their laptop following the instructions below.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
nb, the command above is for Linux but you can find in the anaconda repo the appropriate distribution for your OS. For example, for MacOS:
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o Miniconda3-latest-MacOSX-x86_64.sh
bash Miniconda3-latest-MacOSX-x86_64.sh
Answer yes
to first question to accept license terms.
For the second question about the installation path be sure you insert a path where you have enough space.
Answer yes
to last question to initialize the environment every time you open a new shell.
This will change your .bashrc
or .bash_profile
file in your home directory to point to this conda installation.
To check that this step went fine open a new shell and type which conda
which should point you to the installed binary file. You can now close the old shell.
Now you can install the ML libraries prepared for you as:
git clone https://github.com/jngadiub/MLtutorial.git
cd MLtutorial
conda env create -f mltutorial.yml
conda activate mltutorial
If you have followed the instructions above you will not need to follow these steps but for completeness here are the instructions to build the environment from scratch after you have download and installed conda as above:
conda create --name mltutorial python=3.9
conda activate mltutorial
conda install jupyter
conda install h5py
conda install scikit-learn
pip install tensorflow
conda install pandas
conda install matplotlib
conda install pytorch
conda env export > mltutorial.yml
conda activate mltutorial
nb, if you have a GPU available in order to use it you must install tensorflow-gpu
instead of tensorflow
.
Assuming you have already cloned the repo, let's download the datasets in advance so that we avoid a bandwidth bottleneck:
cd MLtutorial
curl https://cernbox.cern.ch/index.php/s/xmTytsMPvCEA6Ar/download -o Data-MLtutorial.tar.gz
tar -xvzf Data-MLtutorial.tar.gz
ls Data-MLtutorial/JetDataset/
rm Data-MLtutorial.tar.gz
git clone https://github.com/jngadiub/MLtutorial.git
cd MLtutorial
In a new shell start a jupyter notebook with command jupyter notebook
. The browser will automatically open the page where you can navigate the folders and files inside the MLtutorial
folder.