-
Notifications
You must be signed in to change notification settings - Fork 0
Home
-
Install cookiecutter in your system/user python profile (not a virtual environment).
$ pip install --user cookiecutter
-
Surf the file system until your code folder (e.g.
path/to/repos_folder
). This is the parent folder of your code. NB: cookiecutter will create a new folderproject_name
with everything inside. Your actual code will be in a subfolder, i.e.path/to/repos_folder/project_name/src/
Then runcookiecutter
with the link to the data-science-template and prompt the question it will ask you:$ cd Documents/xxx/xxx/Code/ $ cookiecutter https://github.com/drivendata/cookiecutter-data-science # fill the question using project name: kickstart_python # once it is finished, cd the forder kickstart_python $ cd kickstart_python
From now on, our current directory will be
path/to/kickstarn_python/
unless specified
-
set up a virtual environment, named
venv
, specifying the python version.venv
is a typical convention, you could call itremi
if you want. This code will create a folder namedvenv
containing lot of things and a local copy of all the packages you will pip-install from now on.$ virtualenv venv -p python3
-
edit the
.gitignore
by adding the virtualenv's folder with you favorite text editor or just run the following command$ echo venv >> .gitignore
-
set up git and link it to a new github/gitlab reporitory:
- On github.com create an empty reporitory online (it means no README and no license. If you do so it will display usefull command).
- Start git locally and synch it with the following commands:
$ git init # check we are not 'saving' wried files $ git status # if so, commit $ git add . $ git commit -m "first commit" # If github $ git remote add origin https://github.com/USER/python_kickstart.git $ git push -u origin master # avoid writing login and password for the future time $ git config credential.helper store
-
Activate the virtualenv
[user@localhost] project_name/ $ source venv/bin/activate # check that it is activated. You should have (venv) at the beginnig of your command line (venv) [user@localhost] project_name/ $
-
Install the basic dependencies of cookiecutter (if you want). Notice that doing so also you will install the src package by default. Then install your everyday-coding-favorite-life packages: numpy, matplotlib, jupyter
(venv) $ pip install -r requirements.txt (venv) $ pip install numpy matplotlib jupyter
-
Freeze the requirements ('>' overwrite, '>>' append)
(venv) $ pip freeze >> requirements.txt
-
Install the package for a toy example
(venv) $ pip install sklearn
-
in
src/
create themain.py
file and paste the following code:from numpy.random import permutation from sklearn import svm, datasets C = 1.0 gamma = 0.7 iris = datasets.load_iris() perm = permutation(iris.target.size) iris.data = iris.data[perm] iris.target = iris.target[perm] model = svm.SVC(C, 'rbf', gamma=gamma) model.fit(iris.data[:90], iris.target[:90]) print(model.score(iris.data[90:], iris.target[90:]))
-
commit the changes
$ git add . $ git commit -m 'toy svm'
-
edit the
models/train_model.py
andmodels/predict_model.py
files. I In both of the files (actually python modules) create new function respectively In./src/models/train_model.py
:from sklearn import svm def train(data, target, C, gamma): clf = svm.SVC(C, 'rbf', gamma=gamma) clf.fit(data[:90], target[:90]) return clf
In
./src/models/predict_model.py
:def predict(clf, data, target): return clf.score(data, target)
-
Update the main file in order to import with the following imports In
src/main.py
add:from models.predict_model import predict from models.train_model import train
Now The main code should looks like:
# std imports from numpy.random import permutation from sklearn import datasets # my imports from models.predict_model import predict from models.train_model import train C = 1.0 gamma = 0.7 iris = datasets.load_iris() per = permutation(iris.target.size) iris.data = iris.data[per] iris.target = iris.target[per] model = train(iris.data[:90], iris.target[:90], C, gamma) score = predict(model, iris.data[90:], iris.target[90:]) print(score)
-
Run and debug
(venv) $ python src/main.py
-
PIP-install Sacred for tracking experiments
(venv) $ pip install sacred pymongo
-
create a new function for the parameters C and gamma and add the colorators for Sacred
#...add here the new nice imports and add the followings from sacred import Experiment ex = Experiment('iris_svm') # id of the experiments @ex.config def cfg(): C = 1.0 gamma = 0.7 @ex.automain def run(C, gamma): # ... # ... paste here the main #... return score
-
run it from the project's root directory
(venv) $ python src/main.py
-
install mongodb in your system. In a new terminal
$ sudo dnf install mongodb mongodb-server mongoose # start service $ sudo service mongod start # verify it is woring $ mongo # it will start the mongo-db-shell
-
Run and re-run as many time as you want the code with the database flag:
(venv) $ python src/main.py -m MY_IRIS_EXP
notice how the ID value increase at each run
-
In a mongo shell (just run mongo in the command line) check if the MY_IRIS_EXP database exists
$ mongo # after in the mongo shell > show dbs # look for MY_IRIS_EXP entry
-
download and install Ominboard, the sacred+mongo frontends
# in a new terminal $ sudo npm install -g omniboard
-
In the same shell run the server listener
$ omniboard -m localhost:27017:MY_IRIS_EXP
-
go to http://localhost:9000 to access omniboard frontends:
-
play with it
-
add a metric in the main.py file add
@ex.automain def run(C, gamma): ... # the code before ex.log_scalar("val.score", score) return score
-
And what about a typical loss fuction in a for loop? for instance add the following line. We need to pass the object
_run
at themain()
@ex.automain def run(_run, C, gamma): ... # the code before my_loss = 0 for i in range(20): # Explicit step counter (0, 1, 2, 3, ...) # incremented with each call for training.accuracy: _run.log_scalar("training.loss", my_loss, i) my_loss += 1.5*i + np.random.random(1) return score
-
run some experiments
-
play in omniboard