#project-template - A template for scikit-learn extensions
project-template is a template project for scikit-learn compatible extensions.
It aids development of estimators that can be used in scikit-learn pipelines and (hyper)parameter search, while facilitating testing (including some API compliance), documentation, open source development, packaging, and continuous integration.
HTML Documentation - http://contrib.scikit-learn.org/project-template/
The package by itself comes with a single module and an estimator. Before
installing the module you will need numpy
and scipy
.
To install the module execute:
$ python setup.py install
or
pip install sklearn-template
If the installation is successful, and scikit-learn
is correctly installed,
you should be able to execute the following in Python:
>>> from skltemplate import TemplateEstimator
>>> estimator = TemplateEstimator()
>>> estimator.fit(np.arange(10).reshape(10, 1), np.arange(10))
TemplateEstimator
by itself does nothing useful, but it serves as an example
of how other Estimators should be written. It also comes with its own unit
tests under template/tests
which can be run using nosetests
.
Clone the project into your computer by executing
$ git clone https://github.com/scikit-learn-contrib/project-template.git
You should rename the project-template
folder to the name of your project.
To host the project on Github, visit https://github.com/new and create a new
repository. To upload your project on Github execute
$ git remote set-url origin https://github.com/username/project-name.git
$ git push origin master
You are free to modify the source as you want, but at the very least, all your
estimators should pass the check_estimator
test to be scikit-learn compatible.
(If there are valid reasons your estimator cannot pass check_estimator
, please
raise an issue at
scikit-learn so we can make check_estimator
more flexible.)
This template is particularly useful for publishing open-source versions of algorithms that do not meet the criteria for inclusion in the core scikit-learn package (see FAQ), such as recent and unpopular developments in machine learning. However, developing using this template may also be a stepping stone to eventual inclusion in the core package.
In any case, developers should endeavor to adhere to scikit-learn's Contributor's Guide which promotes the use of:
- algorithm-specific unit tests, in addition to
check_estimator
's common tests - PEP8-compliant code
- a clearly documented API using NumpyDoc and PEP257-compliant docstrings
- references to relevant scientific literature in standard citation formats
- doctests to provide succinct usage examples
- standalone examples to illustrate the usage, model visualisation, and benefits/benchmarks of particular algorithms
- efficient code when the need for optimization is supported by benchmarks
The documentation is built using sphinx.
It incorporates narrative documentation from the doc/
directory, standalone
examples from the examples/
directory, and API reference compiled from
estimator docstrings.
To build the documentation locally, ensure that you have sphinx
,
sphinx-gallery
and matplotlib
by executing:
$ pip install sphinx matplotlib sphinx-gallery
The documentation contains a home page (doc/index.rst
), an API
documentation page (doc/api.rst
) and a page documenting the template
module
(doc/template.rst
). Sphinx allows you to automatically document your modules
and classes by using the autodoc
directive (see template.rst
). To change the
asthetics of the docs and other paramteres, edit the doc/conf.py
file. For
more information visit the Sphinx Documentation.
You can also add code examples in the examples
folder. All files inside
the folder of the form plot_*.py
will be executed and their generated
plots will be available for viewing in the /auto_examples
URL.
To build the documentation locally execute
$ cd doc
$ make html
TravisCI allows you to continuously build and test
your code from Github to ensure that no code-breaking changes are pushed. After
you sign up and authourize TravisCI, add your new repository to TravisCI so that
it can start building it. The travis.yml
contains the configuration required
for Travis to build the project. You will have to update the variable MODULE
with the name of your module for Travis to test it. Once you add the project on
TravisCI, all subsequent pushes on the master branch will trigger a Travis
build. By default, the project is tested on Python 2.7 and Python 3.5.
Coveralls reports code coverage statistics of your tests on each push. Sign up on Coveralls and add your repository so that Coveralls can start monitoring it. The project already contains the required configuration for Coveralls to work. All subsequent builds after adding your project will generate a coverage report.
The project uses CircleCI to build its documentation
from the master
branch and host it using Github Pages.
Again, you will need to Sign Up and authorize CircleCI. The configuration
of CircleCI is governed by the circle.yml
file, which needs to be mofified
if you want to setup the docs on your own website. The values to be changed
are
Variable | Value |
---|---|
USERNAME |
The name of the user or organization of the repository where the project and documentation is hosted |
DOC_REPO |
The repository where the documentation will be hosted. This can be the same as the project repository |
DOC_URL |
The relative URL where the documentation will be hosted |
EMAIL |
The email id to use while pushing the documentation, this can be any valid email address |
In addition to this, you will need to grant access to the CircleCI computers
to push to your documentation repository. To do this, visit the Project Settings
page of your project in CircleCI. Select Checkout SSH keys
option and then
choose Create and add user key
option. This should grant CircleCI privileges
to push to the repository https://github.com/USERNAME/DOC_REPO/
.
If all goes well, you should be able to visit the documentation of your project on
https://github.com/USERNAME/DOC_REPO/DOC_URL
Follow the instructions to add a Travis Badge,
Coveralls Badge and
CircleCI Badge to your repository's
README
.
Once your work is mature enough for the general public to use it, you should
submit a Pull Request to modify scikit-learn's
related projects listing.
Please insert brief description of your project and a link to its code
repository or PyPI page.
You may also wish to announce your work on the
scikit-learn-general
mailing list.
Uploading your package to PyPI allows users to
install your package through pip
. Python provides two repositories to upload
your packages. The PyPI Test repository,
which is to be used for testing packages before their release, and the
PyPI repository, where you can make your
releases. You need to register a username and password with both these sites.
The username and passwords for both these sites need not be the same. To upload
your package through the command line, you need to store your username and
password in a file called .pypirc
in your $HOME
directory with the
following format.
[distutils]
index-servers =
pypi
pypitest
[pypi]
repository=https://pypi.python.org/pypi
username=<your-pypi-username>
password=<your-pypi-passowrd>
[pypitest]
repository=https://testpypi.python.org/pypi
username=<your-pypitest-username>
password=<your-pypitest-passowrd>
Make sure that all details in setup.py
are up to date. To upload your package
to the Test server, execute:
python setup.py register -r pypitest
python setup.py sdist upload -r pypitest
Your package should now be visible on: https://testpypi.python.org/pypi
To install a package from the test server, execute:
pip install -i https://testpypi.python.org/pypi <package-name>
Similary, to upload your package to the PyPI server execute
python setup.py register -r pypi
python setup.py sdist upload -r pypi
To install your package, execute:
pip install <package-name>
Thank you for cleanly contributing to the scikit-learn ecosystem!
Virtual-environments are not virtual machines. Virtual-environments are used to avoid library classing between the libraries of a project and those fom the system. Find more information in this virtual environment post describing how to use virtual environment for a mozilla marketplace testing.
Use the following to create a simblefaron
environment based on the ./requirements.txt
associated with the source directory ./src
:
mkvirtualenv simblefaron -a . -r ./requirements.txt
Notice that mkvirtualenv
activates such environment.
The command deactivate
is used to exit the virtual environment.
Once the virtual environment exist on the system, the command workon simblefaron
is rather convenient since it jumps into the working directory and activates the virtual enviroment.
Remember to keep requirements.txt
up to date.
For more details regarding the usage of the virtual enviroment, please look at the command reference.
Initial data encription (a tar.gz encripted taken from)
tarring and compression is a job for tar
and gzip
or bzip2
, crypto is a job for either gpg
or openssl
:
Encrypt
% tar cz folder_to_encrypt | \
openssl enc -aes-256-cbc -e > out.tar.gz.enc
Decrypt
% openssl aes-256-cbc -d -in out.tar.gz.enc | tar xz
Or using gpg
% gpg --encrypt out.tar.gz
the openssl-variant uses symetric encryption, you would have to tell the receiving party about the used 'password' (aka 'the key'). the gpg-variant uses a combination of symetric and asymetric encryption, you use the key of the receiving party (which means that you do not have to tell any password involved to anyone) to create a session key and crypt the content with that key.