Skip to content

Commit

Permalink
Updating contributing doc to add more detail on how to start (#1185)
Browse files Browse the repository at this point in the history
  • Loading branch information
rcap107 authored Dec 11, 2024
1 parent 3dde7f2 commit 32be3ba
Show file tree
Hide file tree
Showing 2 changed files with 104 additions and 26 deletions.
114 changes: 91 additions & 23 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,15 +124,52 @@ See the relevant sections above on how to do this.
Setting up the environment
^^^^^^^^^^^^^^^^^^^^^^^^^^

Follow the steps in the :ref:`installation_instructions` > "From Source" section
to set up your environment, install the required development dependencies, and
run the tests.
To contribute, you will first have to run through some steps:

- Set up your environment by forking the repository (`Github doc on
forking and
cloning <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo>`__).
- Create and activate a new virtual environment:

- With `venv <https://docs.python.org/3/library/venv.html>`__, create
the env with ``python -m venv env_skrub`` and then activate it with
``source env_skrub/bin/activate``.
- With
`conda <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`__,
create the env with ``conda new -n env_skrub`` and activate it with
``conda activate env_skrub``.
- While at the root of your local copy of skrub and within the new
env, install the required development dependencies by running
``pip install --editable ".[dev, lint, test, doc]"``.

- Run ``pre-commit install`` to activate some checks that will run every
time you do a ``git commit`` (mostly, formatting checks).

If you want to make sure that everything runs properly, you can run all
the tests with the command ``pytest -s skrub/tests``; note that this may
take a long time. Some tests may raise warnings such as:

When starting to work on a new issue, it's recommended to create a new branch:
.. code:: sh
UserWarning: Only pandas and polars DataFrames are supported, but input is a Numpy array. Please convert Numpy arrays to DataFrames before passing them to skrub transformers. Converting to pandas DataFrame with columns ['0', '1', …].
warnings.warn(
This is expected, and you may proceed with the next steps without worrying about them. However, no tests should fail at this point: if they do fail, then let us know.
.. code:: console
Now that the development environment is ready, you may start working on
the new issue by creating a new branch:
.. code:: sh
git switch -c branch_name
git checkout -b my-branch-name-eg-fix-issue-123
# make some changes
git add ./the/file-i-changed
git commit -m "my message"
git push --set-upstream origin my-branch-name-eg-fix-issue-123
At this point, if you visit again the `pull requests
page <https://github.com/skrub-data/skrub/pulls>`__ github should show a
banner asking if you want to open a pull request from your new branch.
.. _implementation guidelines:
Expand Down Expand Up @@ -183,7 +220,8 @@ Additionally, you might have updated the internal dataframe API in
``skrub/_dataframe/tests/test_common.py`` to add a test for the
``amazing_function``.
Run each updated test file using ``pytest``:
Run each updated test file using ``pytest``
([pytest docs](https://docs.pytest.org/en/stable/)):
.. code:: sh
Expand All @@ -193,10 +231,20 @@ Run each updated test file using ``pytest``:
The ``-vsl`` flag provides more information when running the tests.
It is also possible to run a specific test, or set of tests using the
commands ``pytest the_file.py::the_test``, or
``pytest the_file.py -k 'test_name_pattern'``. This is helpful to avoid
having to run all the tests.
If you work on Windows, you might have some issues with the working
directory if you use ``pytest``, while ``python -m pytest ...`` should
be more robust.
Once you are satisfied with your changes, you can run all the tests to make sure
that your change did not break code elsewhere:
.. code:: sh
pytest -s skrub/tests
Finally, sync your changes with the remote repository and wait for CI to run.
Expand Down Expand Up @@ -229,6 +277,21 @@ the docstrings. Check for possible problems by running
pytest skrub/path/to/file
Formatting and pre-commit checks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Formatting the code well helps with code development and maintenance,
which why is skrub requires that all commits follow a specific set of
formatting rules to ensure code quality.
Luckily, these checks are performed automatically by the ``pre-commit``
tool (`pre-commit docs <https://pre-commit.com>`__) before any commit
can be pushed. Something worth noting is that if the ``pre-commit``
hooks format some files, the commit will be canceled: you will have to
stage the changes made by ``pre-commit`` and commit again.
Submitting your code
^^^^^^^^^^^^^^^^^^^^
Expand All @@ -237,17 +300,10 @@ a PR by clicking the "Compare & pull request" button on GitHub,
targeting the skrub repository.
Integration
^^^^^^^^^^^

Community consensus is key in the integration process. Expect a minimum
of 1 to 3 reviews depending on the size of the change before we consider
merging the PR.

Please be mindful that maintainers are volunteers, so review times may vary.

Continuous Integration (CI)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
After creating your PR, CI tools will run proceed to run all the tests on all
configurations supported by skrub.
- **Github Actions**:
Used for testing skrub across various platforms (Linux, macOS, Windows)
Expand All @@ -273,18 +329,30 @@ actions are taken.
Note that by default the documentation is built, but only the examples that are
directly modified by the pull request are executed.
- If the remote repository was changed, you might need to run
``pre-commit run --all-files`` to make sure that the formatting is
correct.
- If a specific test environment fails, it is possible to run the tests
in the environment that is failing by using pixi. For example if the
env is ``ci-py309-min-optional-deps``, it is possible to replicate it
using the following command:
CI is testing all possible configurations supported by skrub, so tests may fail
with configurations different from what you are developing with. If this is the
case, it is possible to run the tests in the environment that is failing by
using pixi. For example if the env is ``ci-py309-min-optional-deps``, it is
possible to replicate it using the following command:
.. code:: sh
pixi run -e ci-py309-min-optional-deps pytest skrub/tests/path/to/test
This command downloads the specific environment on the machine, so you can test
it locally and apply fixes, or have a clearer idea of where the code is failing
to discuss with the maintainers.
Finally, if the remote repository was changed, you might need to run
``pre-commit run --all-files`` to make sure that the formatting is
correct.
Integration
^^^^^^^^^^^
Community consensus is key in the integration process. Expect a minimum
of 1 to 3 reviews depending on the size of the change before we consider
merging the PR.
Building the documentation
Expand Down
16 changes: 13 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,14 @@ The goal of skrub is to bridge the gap between tabular data sources and machine-

skrub provides high-level tools for joining dataframes (``Joiner``, ``AggJoiner``, ...),
encoding columns (``MinHashEncoder``, ``ToCategorical``, ...), building a pipeline
(``TableVectorizer``, ``tabular_learner``, ...), and more.
(``TableVectorizer``, ``tabular_learner``, ...), and explore interactively your data (``TableReport``).

.. figure::
https://github.com/rcap107/skrub-datasets/blob/master/data/output.gif?raw=true
:alt: An animation showing how TableReport works

An animation showing how TableReport works


>>> from skrub.datasets import fetch_employee_salaries
>>> dataset = fetch_employee_salaries()
Expand Down Expand Up @@ -69,5 +76,8 @@ The best way to support the development of skrub is to spread the word!
Also, if you already are a skrub user, we would love to hear about your use cases and challenges in the `Discussions <https://github.com/skrub-data/skrub/discussions>`_ section.

To report a bug or suggest enhancements, please
`open an issue <https://docs.github.com/en/issues/tracking-your-work-with-issues/creating-an-issue>`_ and/or
`submit a pull request <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request>`_.
`open an issue <https://docs.github.com/en/issues/tracking-your-work-with-issues/creating-an-issue>`_.

If you want to contribute directly to the library, then check the
`how to contribute <https://skrub-data.org/stable/CONTRIBUTING.html>`_ page on
the website for more information.

0 comments on commit 32be3ba

Please sign in to comment.