Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating contributing doc to add more detail on how to start #1185

Merged
merged 15 commits into from
Dec 11, 2024
Merged
115 changes: 92 additions & 23 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,15 +124,51 @@ See the relevant sections above on how to do this.
Setting up the environment
^^^^^^^^^^^^^^^^^^^^^^^^^^

Follow the steps in the :ref:`installation_instructions` > "From Source" section
to set up your environment, install the required development dependencies, and
run the tests.
To contribute, you will first have to run through some steps:

- Set up your environment by forking the repository (`Github doc on
forking and
cloning <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo>`__).
- Create and activate a new virtual environment:

- With `venv <https://docs.python.org/3/library/venv.html>`__, create
the env with ``python -m venv env_skrub`` and then activate it with
``source env_skrub/bin/activate``.
- With
`conda <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`__,
create the env with ``conda new -n env_skrub`` and activate it with
``conda activate env_skrub``.
- While at the root of your local copy of skrub and within the new
env, install the required development dependencies by running
``pip install --editable ".[dev, lint, test, doc]"``.

- Run ``pre-commit install`` to activate some checks that will run every
time you do a ``git commit`` (mostly, formatting checks).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one thing that can be confusing with pre-commit is it can modify some files and cancel the commit, and then you need to stage its modification and run commit again. I'm not sure what would be the best place to add a little explanation about that without making this section too long 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a section on that a bit further down


If you want to make sure that everything runs properly, you can run all
the tests with the command ``pytest -s skrub/tests``; note that this may
take a long time. Some tests may raise warnings such as:

When starting to work on a new issue, it's recommended to create a new branch:
.. code:: sh
Vincent-Maladiere marked this conversation as resolved.
Show resolved Hide resolved
UserWarning: Only pandas and polars DataFrames are supported, but input is a Numpy array. Please convert Numpy arrays to DataFrames before passing them to skrub transformers. Converting to pandas DataFrame with columns ['0', '1', …].
warnings.warn(

This is expected, and you may proceed with the next steps without worrying about them. However, no tests should fail at this point: if they do fail, then let us know.

Now that the development environment is ready, you may start working on
the new issue by creating a new branch:

.. code:: sh

.. code:: console
git checkout -b my-branch-name-eg-fix-issue-123
# make some changes
git add ./the/file-i-changed
git commit -m "my message"
git push --set-upstream origin my-branch-name-eg-fix-issue-123

git switch -c branch_name
At this point, if you visit again the `pull requests
page <https://github.com/skrub-data/skrub/pulls>`__ github should show a
banner asking if you want to open a pull request from your new branch.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I don't think the "Implementation guidelines" below is very helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeromedockes Do you think it's too generic? Otherwise, I believe it summarizes our first principles quite well for a new contributor.

Avoid using binary extensions, Cython, or other compiled languages.

This one for example is not obvious and it's good to have it highlighted there.



.. _implementation guidelines:
Expand Down Expand Up @@ -183,7 +219,8 @@ Additionally, you might have updated the internal dataframe API in
``skrub/_dataframe/tests/test_common.py`` to add a test for the
``amazing_function``.

Run each updated test file using ``pytest``:
Run each updated test file using ``pytest``
([pytest docs](https://docs.pytest.org/en/stable/)):

.. code:: sh

Expand All @@ -193,10 +230,20 @@ Run each updated test file using ``pytest``:

The ``-vsl`` flag provides more information when running the tests.

It is also possible to run a specific test, or set of tests using the
commands ``pytest the_file.py::the_test``, or
``pytest the_file.py -k 'test_name_pattern'``. This is helpful to avoid
having to run all the tests.

If you work on Windows, you might have some issues with the working
directory if you use ``pytest``, while ``python -m pytest ...`` should
be more robust.

Once you are satisfied with your changes, you can run all the tests to make sure
that your change did not break code elsewhere:

.. code:: sh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe another important pytest feature we could mention is running a specific test either with pytest the_file.py::the_test or pytest the_file.py -k 'test_name_pattern' to avoid waiting for all the other tests to run.

Also, I've noticed that in some cases in windows pytest can fail (I guess that depends on how it was installed) while python -m pytest seems a bit more robust. in the CI it is good to use pytest to avoid adding the current working directory to the python path but for local development I think python -m pytest is a good option; WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we could add a link to the pytest documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added notes

pytest -s skrub/tests

Finally, sync your changes with the remote repository and wait for CI to run.
Expand Down Expand Up @@ -229,6 +276,21 @@ the docstrings. Check for possible problems by running

pytest skrub/path/to/file


Formatting and pre-commit checks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Formatting the code well helps with code development and maintenance,
Vincent-Maladiere marked this conversation as resolved.
Show resolved Hide resolved
which is skrub requires that all commits follow a specific set of
Vincent-Maladiere marked this conversation as resolved.
Show resolved Hide resolved
formatting rules to ensure code quality.

Luckily, these checks are performed automatically by the ``pre-commit``
tool (`pre-commit docs <https://pre-commit.com>`__) before any commit
can be pushed. Something worth noting is that if the ``pre-commit``
hooks format some files, the commit will be canceled: you will have to
stage the changes made by ``pre-commit`` and commit again.


Submitting your code
^^^^^^^^^^^^^^^^^^^^

Expand All @@ -237,17 +299,10 @@ a PR by clicking the "Compare & pull request" button on GitHub,
targeting the skrub repository.


Integration
^^^^^^^^^^^

Community consensus is key in the integration process. Expect a minimum
of 1 to 3 reviews depending on the size of the change before we consider
merging the PR.

Please be mindful that maintainers are volunteers, so review times may vary.

Continuous Integration (CI)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
After creating your PR, CI tools will run proceed to run all the tests on all
configurations supported by skrub.

- **Github Actions**:
Used for testing skrub across various platforms (Linux, macOS, Windows)
Expand All @@ -273,18 +328,32 @@ actions are taken.
Note that by default the documentation is built, but only the examples that are
directly modified by the pull request are executed.

- If the remote repository was changed, you might need to run
``pre-commit run --all-files`` to make sure that the formatting is
correct.
- If a specific test environment fails, it is possible to run the tests
in the environment that is failing by using pixi. For example if the
env is ``ci-py309-min-optional-deps``, it is possible to replicate it
using the following command:
CI is testing all possible configurations supported by skrub, so tests may fail
with configurations different from what you are developing with. If this is the
case, it is possible to run the tests in the environment that is failing by
using pixi. For example if the env is ``ci-py309-min-optional-deps``, it is
possible to replicate it using the following command:

.. code:: sh

pixi run -e ci-py309-min-optional-deps pytest skrub/tests/path/to/test

This command downloads the specific environment on the machine, so you can test
it locally and apply fixes, or have a clearer idea of where the code is failing
to discuss with the maintainers.

Finally, if the remote repository was changed, you might need to run
``pre-commit run --all-files`` to make sure that the formatting is
correct.

Integration
^^^^^^^^^^^

Community consensus is key in the integration process. Expect a minimum
of 1 to 3 reviews depending on the size of the change before we consider
merging the PR.

Please be mindful that maintainers are volunteers, so review times may vary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be mindful that maintainers are volunteers

I don't think this is technically correct nowadays 😁



Building the documentation
Expand Down
16 changes: 13 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,14 @@ The goal of skrub is to bridge the gap between tabular data sources and machine-

skrub provides high-level tools for joining dataframes (``Joiner``, ``AggJoiner``, ...),
encoding columns (``MinHashEncoder``, ``ToCategorical``, ...), building a pipeline
(``TableVectorizer``, ``tabular_learner``, ...), and more.
(``TableVectorizer``, ``tabular_learner``, ...), and explore interactively your data (``TableReport``).

.. figure::
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessarily the final position, I can move it around if needed

https://github.com/rcap107/skrub-datasets/blob/master/data/output.gif?raw=true
:alt: An animation showing how TableReport works

An animation showing how TableReport works


>>> from skrub.datasets import fetch_employee_salaries
>>> dataset = fetch_employee_salaries()
Expand Down Expand Up @@ -69,5 +76,8 @@ The best way to support the development of skrub is to spread the word!
Also, if you already are a skrub user, we would love to hear about your use cases and challenges in the `Discussions <https://github.com/skrub-data/skrub/discussions>`_ section.

To report a bug or suggest enhancements, please
`open an issue <https://docs.github.com/en/issues/tracking-your-work-with-issues/creating-an-issue>`_ and/or
`submit a pull request <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request>`_.
`open an issue <https://docs.github.com/en/issues/tracking-your-work-with-issues/creating-an-issue>`_.

If you want to contribute directly to the library, then check the
`how to contribute <https://skrub-data.org/stable/CONTRIBUTING.html>`_ page on
the website for more information.
Loading