Skip to content

Desbordante 2.1.0

Latest
Compare
Choose a tag to compare
@chernishev chernishev released this 28 Jun 19:46
· 47 commits to main since this release

Release Notes

This minor release serves as a necessary step for isolating code of the console interface and moving it into a separate repository. Our final goal is to create a dedicated Python package called desbordante-cli, which will be implemented purely in Python. It will depend on the core desbordante package that contains the C++ code for pattern mining and validation.

As such, we plan to make minor releases of the core package in the future, followed by the console ones. These releases will contain fewer features, but will come out a lot more frequently. The idea here is to make a release as soon as each individual algorithm is ready rather than accumulating several of them as we did previously. Once a sufficient number of features have been accumulated, a major release will be published, primarily for promotion purposes. It will not provide any new functionality, but will include all the accumulated changes since the last major release.

Changes:

  • We have added support for a novel class of algorithms — the dynamic ones. The idea is to track changes in the dataset in order to update their result on-the-fly rather than processing the whole table again. As a result, they can be up to several orders of magnitude faster than classic (static) ones in some situations. Along with devising dynamic infrastructure, we have implemented the first dynamic algorithm — a dynamic functional dependency validator. A Python interface and an example are provided.
  • We have added support for discovery of differential dependencies. Differential dependency is a relatively novel type of pattern which is very handy for detecting a particular relationship between two column sets. It can be seen as an extension of functional dependency which works well on dirty data. See the article about the pattern for more information. Its implemented discovery algorithm (Split) comes with a Python interface and an example.
  • Discovery of association rules is now available via the Python and console interfaces. An example is also available.

Miscellaneous:

  • Greatly improved the metric functional dependency verification example.
  • Added approximate inclusion dependency discovery algorithms to the C++ core. Python interface, console interface, and an example are still in development.
  • Fixed Python bindings for association rules: the AR objects can be properly copied now.
  • Extended simple statistics module with ten string-related statistics; they are available via the Python interface.
  • Fixed a CLI-breaking bug related to the CFD discovery algorithm.
  • Improved column type deduction in the C++ core.