From ec37e13f7f8a272a26d6875fbdf30867df120204 Mon Sep 17 00:00:00 2001 From: Riccardo Cappuzzo Date: Thu, 21 Nov 2024 10:56:29 +0100 Subject: [PATCH 1/9] Fixing changelog with correct account --- CHANGES.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGES.rst b/CHANGES.rst index 84ac28256..b96fb767e 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -77,7 +77,7 @@ Minor changes Dockès `. * Added a `DropColumnIfNull` transformer that drops columns that contain only null - values. :pr:`1115` by :user: `Riccardo Cappuzzo ` + values. :pr:`1115` by :user: `Riccardo Cappuzzo ` Bug fixes --------- From 0628fc3b81fb5c7cf6842aa4a72b893119ea803c Mon Sep 17 00:00:00 2001 From: Riccardo Cappuzzo Date: Sat, 7 Dec 2024 12:53:24 +0100 Subject: [PATCH 2/9] Updating contrib --- CONTRIBUTING.rst | 69 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 47 insertions(+), 22 deletions(-) diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index cc74f86a9..133a1d5c3 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -124,15 +124,32 @@ See the relevant sections above on how to do this. Setting up the environment ^^^^^^^^^^^^^^^^^^^^^^^^^^ -Follow the steps in the :ref:`installation_instructions` > "From Source" section -to set up your environment, install the required development dependencies, and -run the tests. +To contribute, you will first have to run through some steps: -When starting to work on a new issue, it's recommended to create a new branch: +- Set up your environment by forking the repository ((`Github doc on + forking and + cloning `__)). +- While at the root of your local copy of skrub, install the required + development dependencies by running + ``pip install --editable ".[dev, lint, test, doc]"``. +- Run all the tests by running ``pytest -s skrub/tests``. +- Run ``pre-commit install`` to activate some checks that will run every + time you do a ``git commit`` (mostly, formatting checks). -.. code:: console +Now that the development environment is ready, you may start working on +the new issue by creating a new branch: - git switch -c branch_name +.. code:: sh + + git checkout -b my-branch-name-eg-fix-issue-123 + # make some changes + git add ./the/file-i-changed + git commit -m "my message" + git push --set-upstream origin my-branch-name-eg-fix-issue-123 + +At this point, if you visit again the `pull requests +page `__ github should show a +banner asking if you want to open a pull request from your new branch. .. _implementation guidelines: @@ -197,6 +214,7 @@ Once you are satisfied with your changes, you can run all the tests to make sure that your change did not break code elsewhere: .. code:: sh + pytest -s skrub/tests Finally, sync your changes with the remote repository and wait for CI to run. @@ -237,17 +255,10 @@ a PR by clicking the "Compare & pull request" button on GitHub, targeting the skrub repository. -Integration -^^^^^^^^^^^ - -Community consensus is key in the integration process. Expect a minimum -of 1 to 3 reviews depending on the size of the change before we consider -merging the PR. - -Please be mindful that maintainers are volunteers, so review times may vary. - Continuous Integration (CI) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ +After creating your PR, CI tools will run proceed to run all the tests on all +configurations supported by skrub. - **Github Actions**: Used for testing skrub across various platforms (Linux, macOS, Windows) @@ -273,18 +284,32 @@ actions are taken. Note that by default the documentation is built, but only the examples that are directly modified by the pull request are executed. -- If the remote repository was changed, you might need to run - ``pre-commit run --all-files`` to make sure that the formatting is - correct. -- If a specific test environment fails, it is possible to run the tests - in the environment that is failing by using pixi. For example if the - env is ``ci-py309-min-optional-deps``, it is possible to replicate it - using the following command: +CI is testing all possible configurations supported by skrub, so tests may fail +with configurations different from what you are developing with. If this is the +case, it is possible to run the tests in the environment that is failing by +using pixi. For example if the env is ``ci-py309-min-optional-deps``, it is +possible to replicate it using the following command: .. code:: sh pixi run -e ci-py309-min-optional-deps pytest skrub/tests/path/to/test +This command downloads the specific environment on the machine, so you can test +it locally and apply fixes, or have a clearer idea of where the code is failing +to discuss with the maintainers. + +Finally, if the remote repository was changed, you might need to run + ``pre-commit run --all-files`` to make sure that the formatting is + correct. + +Integration +^^^^^^^^^^^ + +Community consensus is key in the integration process. Expect a minimum +of 1 to 3 reviews depending on the size of the change before we consider +merging the PR. + +Please be mindful that maintainers are volunteers, so review times may vary. Building the documentation From 53d143c50657d896a900b6ef258ac6780636de86 Mon Sep 17 00:00:00 2001 From: Riccardo Cappuzzo Date: Sat, 7 Dec 2024 13:30:28 +0100 Subject: [PATCH 3/9] Added info on creating a new venv --- CONTRIBUTING.rst | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index 133a1d5c3..649725b05 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -129,8 +129,12 @@ To contribute, you will first have to run through some steps: - Set up your environment by forking the repository ((`Github doc on forking and cloning `__)). -- While at the root of your local copy of skrub, install the required - development dependencies by running +- Create a new virtual environment either with + `venv `__ or + `conda `__, + then activate it. +- While at the root of your local copy of skrub and within the new env, + install the required development dependencies by running ``pip install --editable ".[dev, lint, test, doc]"``. - Run all the tests by running ``pytest -s skrub/tests``. - Run ``pre-commit install`` to activate some checks that will run every From 5b106df9db67b5ef17d25bc98444f919f9188569 Mon Sep 17 00:00:00 2001 From: Riccardo Cappuzzo Date: Sat, 7 Dec 2024 15:40:53 +0100 Subject: [PATCH 4/9] Adding a few more improvements --- CONTRIBUTING.rst | 56 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 45 insertions(+), 11 deletions(-) diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index 649725b05..c5d8e7061 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -126,20 +126,29 @@ Setting up the environment To contribute, you will first have to run through some steps: -- Set up your environment by forking the repository ((`Github doc on +- Set up your environment by forking the repository (`Github doc on forking and - cloning `__)). -- Create a new virtual environment either with - `venv `__ or - `conda `__, - then activate it. -- While at the root of your local copy of skrub and within the new env, - install the required development dependencies by running - ``pip install --editable ".[dev, lint, test, doc]"``. -- Run all the tests by running ``pytest -s skrub/tests``. + cloning `__). +- Create and activate a new virtual environment: + + - With `venv `__, create + the env with ``python -m venv env_skrub`` and then activate it with + ``source env_skrub/bin/activate``. + - With + `conda `__, + create the env with ``conda new -n env_skrub`` and activate it with + ``conda activate env_skrub``. + - While at the root of your local copy of skrub and within the new + env, install the required development dependencies by running + ``pip install --editable ".[dev, lint, test, doc]"``. + - Run ``pre-commit install`` to activate some checks that will run every time you do a ``git commit`` (mostly, formatting checks). +If you want to make sure that everything runs properly, you can run all +the tests with the command ``pytest -s skrub/tests``; note that this may +take a long time. + Now that the development environment is ready, you may start working on the new issue by creating a new branch: @@ -204,7 +213,8 @@ Additionally, you might have updated the internal dataframe API in ``skrub/_dataframe/tests/test_common.py`` to add a test for the ``amazing_function``. -Run each updated test file using ``pytest``: +Run each updated test file using ``pytest`` +([pytest docs](https://docs.pytest.org/en/stable/)): .. code:: sh @@ -214,6 +224,15 @@ Run each updated test file using ``pytest``: The ``-vsl`` flag provides more information when running the tests. +It is also possible to run a specific test, or set of tests using the +commands ``pytest the_file.py::the_test``, or +``pytest the_file.py -k 'test_name_pattern'``. This is helpful to avoid +having to run all the tests. + +If you work on Windows, you might have some issues with the working +directory if you use ``pytest``, while ``python -m pytest ...`` should +be more robust. + Once you are satisfied with your changes, you can run all the tests to make sure that your change did not break code elsewhere: @@ -251,6 +270,21 @@ the docstrings. Check for possible problems by running pytest skrub/path/to/file + +Formatting and pre-commit checks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Formatting the code well helps with code development and maintenance, +which is skrub requires that all commits follow a specific set of +formatting rules to ensure code quality. + +Luckily, these checks are performed automatically by the ``pre-commit`` +tool (`pre-commit docs `__) before any commit +can be pushed. Something worth noting is that if the ``pre-commit`` +hooks format some files, the commit will be canceled: you will have to +stage the changes made by ``pre-commit`` and commit again. + + Submitting your code ^^^^^^^^^^^^^^^^^^^^ From fafd78dec3a1ca6b99898fdc26ed7b62d964ffea Mon Sep 17 00:00:00 2001 From: Riccardo Cappuzzo Date: Sat, 7 Dec 2024 15:57:16 +0100 Subject: [PATCH 5/9] Slightly reworded readme --- README.rst | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index c4a4b6fb1..6e404573f 100644 --- a/README.rst +++ b/README.rst @@ -69,5 +69,8 @@ The best way to support the development of skrub is to spread the word! Also, if you already are a skrub user, we would love to hear about your use cases and challenges in the `Discussions `_ section. To report a bug or suggest enhancements, please -`open an issue `_ and/or -`submit a pull request `_. +`open an issue `_. + +If you want to contribute directly to the library, then check the +`how to contribute `_ page on +the website for more information. From 57f6a36d7965fc998a06ad68437a20dc72911b22 Mon Sep 17 00:00:00 2001 From: Riccardo Cappuzzo Date: Sat, 7 Dec 2024 16:05:45 +0100 Subject: [PATCH 6/9] Updating readme --- README.rst | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 6e404573f..255742343 100644 --- a/README.rst +++ b/README.rst @@ -32,7 +32,14 @@ The goal of skrub is to bridge the gap between tabular data sources and machine- skrub provides high-level tools for joining dataframes (``Joiner``, ``AggJoiner``, ...), encoding columns (``MinHashEncoder``, ``ToCategorical``, ...), building a pipeline -(``TableVectorizer``, ``tabular_learner``, ...), and more. +(``TableVectorizer``, ``tabular_learner``, ...), and explore interactively your data (``TableReport``). + +.. figure:: + https://github.com/rcap107/skrub-datasets/blob/master/data/output.gif?raw=true + :alt: An animation showing how TableReport works + + An animation showing how TableReport works + >>> from skrub.datasets import fetch_employee_salaries >>> dataset = fetch_employee_salaries() From 1bd1482c65414351bbcf4351ecaf291ee7e51eba Mon Sep 17 00:00:00 2001 From: Riccardo Cappuzzo Date: Sat, 7 Dec 2024 16:59:22 +0100 Subject: [PATCH 7/9] Adding a note on warnings in the tests --- CONTRIBUTING.rst | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index c5d8e7061..f738f4dca 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -147,7 +147,13 @@ To contribute, you will first have to run through some steps: If you want to make sure that everything runs properly, you can run all the tests with the command ``pytest -s skrub/tests``; note that this may -take a long time. +take a long time. Some tests may raise warnings such as: + +.. code:: sh + UserWarning: Only pandas and polars DataFrames are supported, but input is a Numpy array. Please convert Numpy arrays to DataFrames before passing them to skrub transformers. Converting to pandas DataFrame with columns ['0', '1', …]. + warnings.warn( + +This is expected, and you may proceed with the next steps without worrying about them. However, no tests should fail at this point: if they do fail, then let us know. Now that the development environment is ready, you may start working on the new issue by creating a new branch: From de1b7d99426726f030560927435062df33f84552 Mon Sep 17 00:00:00 2001 From: Riccardo Cappuzzo Date: Mon, 9 Dec 2024 17:12:21 +0100 Subject: [PATCH 8/9] A couple of small fixes --- CONTRIBUTING.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index f738f4dca..0a7389a97 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -150,6 +150,7 @@ the tests with the command ``pytest -s skrub/tests``; note that this may take a long time. Some tests may raise warnings such as: .. code:: sh + UserWarning: Only pandas and polars DataFrames are supported, but input is a Numpy array. Please convert Numpy arrays to DataFrames before passing them to skrub transformers. Converting to pandas DataFrame with columns ['0', '1', …]. warnings.warn( @@ -281,7 +282,7 @@ Formatting and pre-commit checks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Formatting the code well helps with code development and maintenance, -which is skrub requires that all commits follow a specific set of +which why is skrub requires that all commits follow a specific set of formatting rules to ensure code quality. Luckily, these checks are performed automatically by the ``pre-commit`` From 09d0090cd457a2213bf42b61dfd01b6ea9052b28 Mon Sep 17 00:00:00 2001 From: Riccardo Cappuzzo Date: Tue, 10 Dec 2024 09:31:06 +0100 Subject: [PATCH 9/9] Removed a line --- CONTRIBUTING.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index 0a7389a97..e47cd89ea 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -354,8 +354,6 @@ Community consensus is key in the integration process. Expect a minimum of 1 to 3 reviews depending on the size of the change before we consider merging the PR. -Please be mindful that maintainers are volunteers, so review times may vary. - Building the documentation --------------------------