-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tabmat v4 alpha #286
Tabmat v4 alpha #286
Conversation
It seems that our CI is set up to publish to PyPI on any GH release (including pre-releases), so creating a release with a corresponding alpha/beta/rc tag should be enough to get the v4 branch to PyPI. |
* Add column name getters * Matrix names are also combined * Add names to constructors * Add indexing support for column names * Remove unnecessary code * Better default column names * Reduce code duplication * Saner defaults * Add convenient getters and setters * Fix indexing * Smarter setter for categorical matrices * Add tests * Fix subsetting with np.newaxis * Remove the walrus :( * Fix test * Fix indexing with np.ix_ * Propagate column names where it makes sense * Fix merge mistake * Add changelog entry
* Add an experimental tabmat materializer class * Nicer way of handling interactions * Have proper column names [skip ci] * Make dummy ordering consistent with pandas [skip ci] * Fix mistake in categorical interactions [skip ci] * Add formulaic to environment files Have not added to the conda recipe yet. Should probably be optional. * Add from_formula constructor * Add some tests * Add more tests * Major refactoring - simplify categorical interactions - NaNs in categoricals should be handled correctly - parity with formulaic in categorical names * Make name formatting custommizable - interaction_separator - categorical_format - intercept_name * Add formulaic to conda recipe * Implement `C()` function to convert to categoricals * Auto-convert strings to categories * Fix C() not working from materializer interface * Add the pandasmaterializer tests from formulaic * Add formulaic to setup.py deps * Implement suggestions from code review * Clean up code - Add docstrings - Add type hints - Rename some classes * Pin formulaic minimum version * Add support for architectures not supported by xsimd (#262) * Release 3.1.9 (#263) * Pre-commit autoupdate (#264) Co-authored-by: quant-ranger[bot] <132915763+quant-ranger[bot]@users.noreply.github.com> * Add params for density and cardinality thresholds * Skip python 3.6 build * Refactor to avoid circular imports * Interaction of dropped and NA is dropped * Add type hint for context * Add unit tests for interactable vectors * Add more checks * Change argument name * Make C() stateful (remember levels) * Add test for categorizer state * More correct handling of encoding categoricals * Make adding an intercept implicitly parametrizable Default is False * Add na_action parameter to constrictor * Add test for sparse numerical columns * Add option to not add the constant column * Pre-commit autoupdate (#274) * Pre-commit autoupdate (#276) Co-authored-by: quant-ranger[bot] <132915763+quant-ranger[bot]@users.noreply.github.com> * Bump pypa/gh-action-pypi-publish from 1.8.6 to 1.8.7 (#277) Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.8.6 to 1.8.7. - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](pypa/gh-action-pypi-publish@v1.8.6...v1.8.7) --- updated-dependencies: - dependency-name: pypa/gh-action-pypi-publish dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump pypa/gh-action-pypi-publish from 1.8.7 to 1.8.8 (#279) Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.8.7 to 1.8.8. - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](pypa/gh-action-pypi-publish@v1.8.7...v1.8.8) --- updated-dependencies: - dependency-name: pypa/gh-action-pypi-publish dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump pypa/cibuildwheel from 2.13.1 to 2.14.1 (#280) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.13.1 to 2.14.1. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.13.1...v2.14.1) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Minimal implementation (tests green) * Remove sum method and rely on np.sum * Force DenseMatrix to always be 2-dimensional * Add __repr__ and __str__ methods * Fix as_mx * Fix ufunc return value * Wrap SparseMatrix, too * Demo of how the ufunc interface can be implemented * Do not subclass csc_matrix * Improve the performance of `from_pandas` in the case of low-cardinality categoricals (#275) * Improve the performance of `from_pandas` * Update changelog according to review * Add benchmark data to .gitignore (#282) * Demonstrate binary ufuncs for sparse * Add tocsc method * Fix type checks * Minor improvements * ufunc support for categoricals * Remove __array_ufunc__ interface * Remove numpy operator mixin * Add hstack function * Add method for unpacking underlying array * Add __matmul__ methods to SparseMatrix * Stricter and more consistent indexing * Be consistent when instantiating from 1d arrays * Adjust tests to work with v4 * Fix type hints * Add changelog entry * term and column names for formula-based matrices * Fix handling of formula-based names * Add tests for formula-based names --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Martin Stancsics <[email protected]> Co-authored-by: Uwe L. Korn <[email protected]> Co-authored-by: quant-ranger[bot] <132915763+quant-ranger[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can merge this very soon. Let's check that the documentation is all there and then let's move to glum.
🎉 |
Checklist
CHANGELOG.rst
entryThe core changes for the Tabmat 4.0 release. The main goal is making the API of
MatrixBase subclasses more consistent. All other PRs planned for the 4.0 release are based on this (
tabmat-v4`) branch.Major changes
DenseMatrix
andSparseMatrix
are not subclasses of numpy ans scipy arrays, anymore. It means that they do not inherit their (sometimes conflicting) behavior, and only expose a minimal interface (i.e. mostly the methods thatSplitMatrix
exposes).DenseMatrix
andSparseMatrix
now contain their data in an_array
attribute, containing anumpy.ndarray
or ascipy.csc_matrix
, respectively. The underlying data structures can be accessed using the.unpack()
method.MatrixBase
objects are always 2-dimensional (this was not the case withDenseMatrix
before. One-dimensional inputs to their constructors are interpreted as column matrices.