The notebooks are a modular introduction to machine learning in python using scikit-learn
with examples and tips.
The material is in jupyter notebook format and was designed to be compatible with Python >= 2.6 or >= 3.3. To use these notebooks interatively (intended use), you will need a jupyter/ipython notebook install (see below).
Also, included is a brief introductory guide to jupyter notebooks in Notebook_anatomy notebook. If you are unfamiliar with jupyter/ipython notebooks, please take some time to look at this file.
For a quick deployment, simply click the
launch binder
link at the bottom of this page. However, we recommend a local install for more customizable setups, flexibility and possiblities.
Note: the requirements.txt file above is a snapshot of the latest
pip
installed packages from a successful ML ecosystem.conda
should install the best dependencies for thescikit-learn
used and may have different versions.
It is generally best practice to have a distinct development environment for various Python projects. There are multiple options available to do this such as virtualenv and Conda. For this project, we will be using the Conda environment.
To get started, you can install miniconda3 to get python3 as well as python2.
If you already have Python installed, you can install Conda via pip
:
pip install auxlib conda
-
To setup a python 2.7 development environment in addition to your python 3 conda install for this project (done after installing miniconda3), you can run:
conda create --name sklearn python=2
- This installs into
C:\Miniconda3\envs\python2\
so I added this to system path (on Windows) - On Linux and OS/X, this depends on where the Python Framework is installed. On OS/X using Homebrew, this installs into
/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/envs/python2/bin
- See here for more detailed instructions
-
To activate the development environment, from the
bin
folder of your conda environment, run- Windows:
activate sklearn
- Linux/OSX:
source activate sklearn
- Windows:
-
Ensure ipython/ipython2 is installed in the Python environment
- Windows:
c:\Miniconda3\envs\python2\Scripts\ipython2.exe kernel install --name python2 --display-name "Python 2"
- Linux/OSX:
ipython2 kernel install --name python2 --display-name "Python 2"
(may needsudo
)
- Windows:
-
If, at any point, you desire to exit the development environment, simply type the following:
- Windows:
deactivate
- Linux/OSX:
source deactivate
- Windows:
The easiest way to install jupyter notebook is via conda install
- Run
conda install jupyter
from your terminal. Linux/OSX may requiresudo
permissions. - Navigate to the directory containing this repository, and execute
jupyter notebook
. This will start a notebook service locally for accessing notebooks in your browser. Drill down on the home page to your notebook of interest.
For a notebook primer go to Notebook_anatomy.ipynb
on this repo. The very short story is: to execute a cell just hit Shift-Enter. There are many more shortcuts in primer.
This tutorial requires the following packages:
- numpy version 1.5 or later: http://www.numpy.org/
- scipy version 0.10 or later: http://www.scipy.org/
- pandas http://pandas.pydata.org/
- matplotlib version 1.3 or later: http://matplotlib.org/
- scikit-learn version 0.14 or later: http://scikit-learn.org
- jupyter http://jupyter.readthedocs.org/en/latest/install.html
You can use your development environment of choice, but if you used conda
as described above, simply run:
$ conda install numpy scipy matplotlib scikit-learn jupyter
We have also provided a requirements.txt file above for use with pip.
There are many different ways to install python and the package ecosystem for machine learning. They are not all going to be covered here, but essentially you have the following choices:
- anaconda/miniconda aka conda (shown above)
- download python and pip install packages
- use a docker image (this is one for jupyter+sklearn+skflow+tensorflow)
- Google cloud platform has a jupyter notebook service called Datalab (quickstart here). It has tensorflow pre-installed (needed for next tutorial).
- Click the Binder link at the bottom of this page to deploy a notebook setup.
Or a combination of the above.
A quick tip if you are installing in a non-conda way with pip
and are on Windows, many of the data analysis packages are tricky (compiled dependencies) to install. A nice "unofficial" repository for binaries of packages like numpy
and a myriad of others was created and maintained by Christoph Gohlke. This site is here.
The next tutorial in this workshop is on tensorflow
and the installation instructions are in this README