Skip to content

Commit

Permalink
Merge pull request #969 from openml/develop
Browse files Browse the repository at this point in the history
Prepare 11.0 release
  • Loading branch information
mfeurer authored Oct 25, 2020
2 parents 55b3343 + 79a6705 commit bc87333
Show file tree
Hide file tree
Showing 106 changed files with 8,341 additions and 6,180 deletions.
10 changes: 10 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[flake8]
max-line-length = 100
show-source = True
select = C,E,F,W,B,T
ignore = E203, E402, W503
per-file-ignores =
*__init__.py:F401
exclude =
venv
examples
28 changes: 28 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
repos:
- repo: https://github.com/psf/black
rev: 19.10b0
hooks:
- id: black
args: [--line-length=100]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.761
hooks:
- id: mypy
name: mypy openml
files: openml/*
- id: mypy
name: mypy tests
files: tests/*
- repo: https://gitlab.com/pycqa/flake8
rev: 3.8.3
hooks:
- id: flake8
name: flake8 openml
files: openml/*
additional_dependencies:
- flake8-print==3.1.4
- id: flake8
name: flake8 tests
files: tests/*
additional_dependencies:
- flake8-print==3.1.4
16 changes: 11 additions & 5 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,17 @@ env:
- TEST_DIR=/tmp/test_dir/
- MODULE=openml
matrix:
- DISTRIB="conda" PYTHON_VERSION="3.5" SKLEARN_VERSION="0.21.2"
- DISTRIB="conda" PYTHON_VERSION="3.6" SKLEARN_VERSION="0.21.2"
- DISTRIB="conda" PYTHON_VERSION="3.7" SKLEARN_VERSION="0.21.2" RUN_FLAKE8="true" SKIP_TESTS="true"
- DISTRIB="conda" PYTHON_VERSION="3.7" SKLEARN_VERSION="0.21.2" COVERAGE="true" DOCPUSH="true"
- DISTRIB="conda" PYTHON_VERSION="3.7" SKLEARN_VERSION="0.20.2"
- DISTRIB="conda" PYTHON_VERSION="3.6" SKLEARN_VERSION="0.23.1" COVERAGE="true" DOCPUSH="true" SKIP_TESTS="true"
- DISTRIB="conda" PYTHON_VERSION="3.7" SKLEARN_VERSION="0.23.1" RUN_FLAKE8="true" SKIP_TESTS="true"
- DISTRIB="conda" PYTHON_VERSION="3.8" SKLEARN_VERSION="0.23.1" TEST_DIST="true"
- DISTRIB="conda" PYTHON_VERSION="3.7" SKLEARN_VERSION="0.23.1" TEST_DIST="true"
- DISTRIB="conda" PYTHON_VERSION="3.6" SKLEARN_VERSION="0.23.1" TEST_DIST="true"
- DISTRIB="conda" PYTHON_VERSION="3.8" SKLEARN_VERSION="0.22.2" TEST_DIST="true"
- DISTRIB="conda" PYTHON_VERSION="3.7" SKLEARN_VERSION="0.22.2" TEST_DIST="true"
- DISTRIB="conda" PYTHON_VERSION="3.6" SKLEARN_VERSION="0.22.2" TEST_DIST="true"
- DISTRIB="conda" PYTHON_VERSION="3.7" SKLEARN_VERSION="0.21.2" TEST_DIST="true"
- DISTRIB="conda" PYTHON_VERSION="3.6" SKLEARN_VERSION="0.21.2" TEST_DIST="true"
- DISTRIB="conda" PYTHON_VERSION="3.6" SKLEARN_VERSION="0.20.2"
# Checks for older scikit-learn versions (which also don't nicely work with
# Python3.7)
- DISTRIB="conda" PYTHON_VERSION="3.6" SKLEARN_VERSION="0.19.2"
Expand Down
185 changes: 129 additions & 56 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,76 @@
How to contribute
-----------------
This document describes the workflow on how to contribute to the openml-python package.
If you are interested in connecting a machine learning package with OpenML (i.e.
write an openml-python extension) or want to find other ways to contribute, see [this page](https://openml.github.io/openml-python/master/contributing.html#contributing).

The preferred workflow for contributing to the OpenML python connector is to
Scope of the package
--------------------

The scope of the OpenML Python package is to provide a Python interface to
the OpenML platform which integrates well with Python's scientific stack, most
notably [numpy](http://www.numpy.org/), [scipy](https://www.scipy.org/) and
[pandas](https://pandas.pydata.org/).
To reduce opportunity costs and demonstrate the usage of the package, it also
implements an interface to the most popular machine learning package written
in Python, [scikit-learn](http://scikit-learn.org/stable/index.html).
Thereby it will automatically be compatible with many machine learning
libraries written in Python.

We aim to keep the package as light-weight as possible and we will try to
keep the number of potential installation dependencies as low as possible.
Therefore, the connection to other machine learning libraries such as
*pytorch*, *keras* or *tensorflow* should not be done directly inside this
package, but in a separate package using the OpenML Python connector.
More information on OpenML Python connectors can be found [here](https://openml.github.io/openml-python/master/contributing.html#contributing).

Reporting bugs
--------------
We use GitHub issues to track all bugs and feature requests; feel free to
open an issue if you have found a bug or wish to see a feature implemented.

It is recommended to check that your issue complies with the
following rules before submitting:

- Verify that your issue is not being currently addressed by other
[issues](https://github.com/openml/openml-python/issues)
or [pull requests](https://github.com/openml/openml-python/pulls).

- Please ensure all code snippets and error messages are formatted in
appropriate code blocks.
See [Creating and highlighting code blocks](https://help.github.com/articles/creating-and-highlighting-code-blocks).

- Please include your operating system type and version number, as well
as your Python, openml, scikit-learn, numpy, and scipy versions. This information
can be found by running the following code snippet:
```python
import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import sklearn; print("Scikit-Learn", sklearn.__version__)
import openml; print("OpenML", openml.__version__)
```

Determine what contribution to make
-----------------------------------
Great! You've decided you want to help out. Now what?
All contributions should be linked to issues on the [Github issue tracker](https://github.com/openml/openml-python/issues).
In particular for new contributors, the *good first issue* label should help you find
issues which are suitable for beginners. Resolving these issues allow you to start
contributing to the project without much prior knowledge. Your assistance in this area
will be greatly appreciated by the more experienced developers as it helps free up
their time to concentrate on other issues.

If you encountered a particular part of the documentation or code that you want to improve,
but there is no related open issue yet, open one first.
This is important since you can first get feedback or pointers from experienced contributors.

To let everyone know you are working on an issue, please leave a comment that states you will work on the issue
(or, if you have the permission, *assign* yourself to the issue). This avoids double work!

General git workflow
--------------------

The preferred workflow for contributing to openml-python is to
fork the [main repository](https://github.com/openml/openml-python) on
GitHub, clone, check out the branch `develop`, and develop on a new branch
branch. Steps:
Expand Down Expand Up @@ -109,75 +178,79 @@ following rules before you submit a pull request:
- If any source file is being added to the repository, please add the BSD 3-Clause license to it.


You can also check for common programming errors with the following
tools:

- Code with good unittest **coverage** (at least 80%), check with:

First install openml with its test dependencies by running
```bash
$ pip install pytest pytest-cov
$ pytest --cov=. path/to/tests_for_package
$ pip install -e .[test]
```

- No style warnings, check with:

from the repository folder.
Then configure pre-commit through
```bash
$ pre-commit install
```
This will install dependencies to run unit tests, as well as [pre-commit](https://pre-commit.com/).
To run the unit tests, and check their code coverage, run:
```bash
$ pip install flake8
$ flake8 --ignore E402,W503 --show-source --max-line-length 100
$ pytest --cov=. path/to/tests_for_package
```

- No mypy (typing) issues, check with:

Make sure your code has good unittest **coverage** (at least 80%).

Pre-commit is used for various style checking and code formatting.
Before each commit, it will automatically run:
- [black](https://black.readthedocs.io/en/stable/) a code formatter.
This will automatically format your code.
Make sure to take a second look after any formatting takes place,
if the resulting code is very bloated, consider a (small) refactor.
*note*: If Black reformats your code, the commit will automatically be aborted.
Make sure to add the formatted files (back) to your commit after checking them.
- [mypy](https://mypy.readthedocs.io/en/stable/) a static type checker.
In particular, make sure each function you work on has type hints.
- [flake8](https://flake8.pycqa.org/en/latest/index.html) style guide enforcement.
Almost all of the black-formatted code should automatically pass this check,
but make sure to make adjustments if it does fail.

If you want to run the pre-commit tests without doing a commit, run:
```bash
$ pip install mypy
$ mypy openml --ignore-missing-imports --follow-imports skip
$ pre-commit run --all-files
```
Make sure to do this at least once before your first commit to check your setup works.

Filing bugs
-----------
We use GitHub issues to track all bugs and feature requests; feel free to
open an issue if you have found a bug or wish to see a feature implemented.

It is recommended to check that your issue complies with the
following rules before submitting:

- Verify that your issue is not being currently addressed by other
[issues](https://github.com/openml/openml-python/issues)
or [pull requests](https://github.com/openml/openml-python/pulls).

- Please ensure all code snippets and error messages are formatted in
appropriate code blocks.
See [Creating and highlighting code blocks](https://help.github.com/articles/creating-and-highlighting-code-blocks).
Executing a specific unit test can be done by specifying the module, test case, and test.
To obtain a hierarchical list of all tests, run

- Please include your operating system type and version number, as well
as your Python, openml, scikit-learn, numpy, and scipy versions. This information
can be found by running the following code snippet:
```bash
$ pytest --collect-only
<Module 'tests/test_datasets/test_dataset.py'>
<UnitTestCase 'OpenMLDatasetTest'>
<TestCaseFunction 'test_dataset_format_constructor'>
<TestCaseFunction 'test_get_data'>
<TestCaseFunction 'test_get_data_rowid_and_ignore_and_target'>
<TestCaseFunction 'test_get_data_with_ignore_attributes'>
<TestCaseFunction 'test_get_data_with_rowid'>
<TestCaseFunction 'test_get_data_with_target'>
<UnitTestCase 'OpenMLDatasetTestOnTestServer'>
<TestCaseFunction 'test_tagging'>
```

```python
import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import sklearn; print("Scikit-Learn", sklearn.__version__)
import openml; print("OpenML", openml.__version__)
```
You may then run a specific module, test case, or unit test respectively:
```bash
$ pytest tests/test_datasets/test_dataset.py
$ pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest
$ pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest::test_get_data
```

New contributor tips
--------------------
*NOTE*: In the case the examples build fails during the Continuous Integration test online, please
fix the first failing example. If the first failing example switched the server from live to test
or vice-versa, and the subsequent examples expect the other server, the ensuing examples will fail
to be built as well.

A great way to start contributing to openml-python is to pick an item
from the list of [Good First Issues](https://github.com/openml/openml-python/labels/Good%20first%20issue)
in the issue tracker. Resolving these issues allow you to start
contributing to the project without much prior knowledge. Your
assistance in this area will be greatly appreciated by the more
experienced developers as it helps free up their time to concentrate on
other issues.
Happy testing!

Documentation
-------------

We are glad to accept any sort of documentation: function docstrings,
reStructuredText documents (like this one), tutorials, etc.
reStructuredText documents, tutorials, etc.
reStructuredText documents live in the source code repository under the
doc/ directory.

Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ all: clean inplace test

clean:
$(PYTHON) setup.py clean
rm -rf dist
rm -rf dist openml.egg-info

in: inplace # just a shortcut
inplace:
Expand Down
30 changes: 27 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
# OpenML-Python

A python interface for [OpenML](http://openml.org), an online platform for open science collaboration in machine learning.
It can be used to download or upload OpenML data such as datasets and machine learning experiment results.
You can find the documentation on the [openml-python website](https://openml.github.io/openml-python).
If you wish to contribute to the package, please see our [contribution guidelines](https://github.com/openml/openml-python/blob/develop/CONTRIBUTING.md).

## General

* [Documentation](https://openml.github.io/openml-python).
* [Contribution guidelines](https://github.com/openml/openml-python/blob/develop/CONTRIBUTING.md).

[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)

Master branch:

Expand All @@ -16,3 +21,22 @@ Development branch:
[![Build Status](https://travis-ci.org/openml/openml-python.svg?branch=develop)](https://travis-ci.org/openml/openml-python)
[![Build status](https://ci.appveyor.com/api/projects/status/blna1eip00kdyr25/branch/develop?svg=true)](https://ci.appveyor.com/project/OpenML/openml-python/branch/develop)
[![Coverage Status](https://coveralls.io/repos/github/openml/openml-python/badge.svg?branch=develop)](https://coveralls.io/github/openml/openml-python?branch=develop)

## Citing OpenML-Python

If you use OpenML-Python in a scientific publication, we would appreciate a reference to the
following paper:

[Matthias Feurer, Jan N. van Rijn, Arlind Kadra, Pieter Gijsbers, Neeratyoy Mallik, Sahithya Ravi, Andreas Müller, Joaquin Vanschoren, Frank Hutter<br/>
**OpenML-Python: an extensible Python API for OpenML**<br/>
*arXiv:1911.02490 [cs.LG]*](https://arxiv.org/abs/1911.02490)

Bibtex entry:
```bibtex
@article{feurer-arxiv19a,
author = {Matthias Feurer and Jan N. van Rijn and Arlind Kadra and Pieter Gijsbers and Neeratyoy Mallik and Sahithya Ravi and Andreas Müller and Joaquin Vanschoren and Frank Hutter},
title = {OpenML-Python: an extensible Python API for OpenML},
journal = {arXiv:1911.02490},
year = {2019},
}
```
10 changes: 6 additions & 4 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ environment:
# CMD_IN_ENV: "cmd /E:ON /V:ON /C .\\appveyor\\scikit-learn-contrib\\run_with_env.cmd"

matrix:
- PYTHON: "C:\\Python35-x64"
PYTHON_VERSION: "3.5"
- PYTHON: "C:\\Python3-x64"
PYTHON_VERSION: "3.6"
PYTHON_ARCH: "64"
MINICONDA: "C:\\Miniconda35-x64"
MINICONDA: "C:\\Miniconda36-x64"

matrix:
fast_finish: true
Expand All @@ -35,7 +35,9 @@ install:
# Install the build and runtime dependencies of the project.
- "cd C:\\projects\\openml-python"
- "pip install .[examples,test]"
- conda install --quiet --yes scikit-learn=0.20.0
- "pip install scikit-learn==0.21"
# Uninstall coverage, as it leads to an error on appveyor
- "pip uninstall -y pytest-cov"


# Not a .NET project, we build scikit-learn in the install step instead
Expand Down
9 changes: 0 additions & 9 deletions ci_scripts/flake8_diff.sh

This file was deleted.

Loading

0 comments on commit bc87333

Please sign in to comment.