diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index cc74f86a9..e47cd89ea 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -124,15 +124,52 @@ See the relevant sections above on how to do this. Setting up the environment ^^^^^^^^^^^^^^^^^^^^^^^^^^ -Follow the steps in the :ref:`installation_instructions` > "From Source" section -to set up your environment, install the required development dependencies, and -run the tests. +To contribute, you will first have to run through some steps: + +- Set up your environment by forking the repository (`Github doc on + forking and + cloning `__). +- Create and activate a new virtual environment: + + - With `venv `__, create + the env with ``python -m venv env_skrub`` and then activate it with + ``source env_skrub/bin/activate``. + - With + `conda `__, + create the env with ``conda new -n env_skrub`` and activate it with + ``conda activate env_skrub``. + - While at the root of your local copy of skrub and within the new + env, install the required development dependencies by running + ``pip install --editable ".[dev, lint, test, doc]"``. + +- Run ``pre-commit install`` to activate some checks that will run every + time you do a ``git commit`` (mostly, formatting checks). + +If you want to make sure that everything runs properly, you can run all +the tests with the command ``pytest -s skrub/tests``; note that this may +take a long time. Some tests may raise warnings such as: -When starting to work on a new issue, it's recommended to create a new branch: +.. code:: sh + + UserWarning: Only pandas and polars DataFrames are supported, but input is a Numpy array. Please convert Numpy arrays to DataFrames before passing them to skrub transformers. Converting to pandas DataFrame with columns ['0', '1', …]. + warnings.warn( + +This is expected, and you may proceed with the next steps without worrying about them. However, no tests should fail at this point: if they do fail, then let us know. -.. code:: console +Now that the development environment is ready, you may start working on +the new issue by creating a new branch: + +.. code:: sh - git switch -c branch_name + git checkout -b my-branch-name-eg-fix-issue-123 + # make some changes + git add ./the/file-i-changed + git commit -m "my message" + git push --set-upstream origin my-branch-name-eg-fix-issue-123 + +At this point, if you visit again the `pull requests +page `__ github should show a +banner asking if you want to open a pull request from your new branch. .. _implementation guidelines: @@ -183,7 +220,8 @@ Additionally, you might have updated the internal dataframe API in ``skrub/_dataframe/tests/test_common.py`` to add a test for the ``amazing_function``. -Run each updated test file using ``pytest``: +Run each updated test file using ``pytest`` +([pytest docs](https://docs.pytest.org/en/stable/)): .. code:: sh @@ -193,10 +231,20 @@ Run each updated test file using ``pytest``: The ``-vsl`` flag provides more information when running the tests. +It is also possible to run a specific test, or set of tests using the +commands ``pytest the_file.py::the_test``, or +``pytest the_file.py -k 'test_name_pattern'``. This is helpful to avoid +having to run all the tests. + +If you work on Windows, you might have some issues with the working +directory if you use ``pytest``, while ``python -m pytest ...`` should +be more robust. + Once you are satisfied with your changes, you can run all the tests to make sure that your change did not break code elsewhere: .. code:: sh + pytest -s skrub/tests Finally, sync your changes with the remote repository and wait for CI to run. @@ -229,6 +277,21 @@ the docstrings. Check for possible problems by running pytest skrub/path/to/file + +Formatting and pre-commit checks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Formatting the code well helps with code development and maintenance, +which why is skrub requires that all commits follow a specific set of +formatting rules to ensure code quality. + +Luckily, these checks are performed automatically by the ``pre-commit`` +tool (`pre-commit docs `__) before any commit +can be pushed. Something worth noting is that if the ``pre-commit`` +hooks format some files, the commit will be canceled: you will have to +stage the changes made by ``pre-commit`` and commit again. + + Submitting your code ^^^^^^^^^^^^^^^^^^^^ @@ -237,17 +300,10 @@ a PR by clicking the "Compare & pull request" button on GitHub, targeting the skrub repository. -Integration -^^^^^^^^^^^ - -Community consensus is key in the integration process. Expect a minimum -of 1 to 3 reviews depending on the size of the change before we consider -merging the PR. - -Please be mindful that maintainers are volunteers, so review times may vary. - Continuous Integration (CI) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ +After creating your PR, CI tools will run proceed to run all the tests on all +configurations supported by skrub. - **Github Actions**: Used for testing skrub across various platforms (Linux, macOS, Windows) @@ -273,18 +329,30 @@ actions are taken. Note that by default the documentation is built, but only the examples that are directly modified by the pull request are executed. -- If the remote repository was changed, you might need to run - ``pre-commit run --all-files`` to make sure that the formatting is - correct. -- If a specific test environment fails, it is possible to run the tests - in the environment that is failing by using pixi. For example if the - env is ``ci-py309-min-optional-deps``, it is possible to replicate it - using the following command: +CI is testing all possible configurations supported by skrub, so tests may fail +with configurations different from what you are developing with. If this is the +case, it is possible to run the tests in the environment that is failing by +using pixi. For example if the env is ``ci-py309-min-optional-deps``, it is +possible to replicate it using the following command: .. code:: sh pixi run -e ci-py309-min-optional-deps pytest skrub/tests/path/to/test +This command downloads the specific environment on the machine, so you can test +it locally and apply fixes, or have a clearer idea of where the code is failing +to discuss with the maintainers. + +Finally, if the remote repository was changed, you might need to run + ``pre-commit run --all-files`` to make sure that the formatting is + correct. + +Integration +^^^^^^^^^^^ + +Community consensus is key in the integration process. Expect a minimum +of 1 to 3 reviews depending on the size of the change before we consider +merging the PR. Building the documentation diff --git a/README.rst b/README.rst index c4a4b6fb1..255742343 100644 --- a/README.rst +++ b/README.rst @@ -32,7 +32,14 @@ The goal of skrub is to bridge the gap between tabular data sources and machine- skrub provides high-level tools for joining dataframes (``Joiner``, ``AggJoiner``, ...), encoding columns (``MinHashEncoder``, ``ToCategorical``, ...), building a pipeline -(``TableVectorizer``, ``tabular_learner``, ...), and more. +(``TableVectorizer``, ``tabular_learner``, ...), and explore interactively your data (``TableReport``). + +.. figure:: + https://github.com/rcap107/skrub-datasets/blob/master/data/output.gif?raw=true + :alt: An animation showing how TableReport works + + An animation showing how TableReport works + >>> from skrub.datasets import fetch_employee_salaries >>> dataset = fetch_employee_salaries() @@ -69,5 +76,8 @@ The best way to support the development of skrub is to spread the word! Also, if you already are a skrub user, we would love to hear about your use cases and challenges in the `Discussions `_ section. To report a bug or suggest enhancements, please -`open an issue `_ and/or -`submit a pull request `_. +`open an issue `_. + +If you want to contribute directly to the library, then check the +`how to contribute `_ page on +the website for more information.