diff --git a/docs/index-all.rst b/docs/index-all.rst index b2b35a1..9df21d5 100644 --- a/docs/index-all.rst +++ b/docs/index-all.rst @@ -18,3 +18,4 @@ CrateDB SQLAlchemy dialect -- all pages advanced-querying inspection-reflection dataframe + support diff --git a/docs/index.rst b/docs/index.rst index f4c3677..d2db78a 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -79,8 +79,10 @@ kinds of `GeoJSON geometry objects`_. .. toctree:: :maxdepth: 2 + :titlesonly: overview + support .. _synopsis: diff --git a/docs/overview.rst b/docs/overview.rst index 070898b..89421e7 100644 --- a/docs/overview.rst +++ b/docs/overview.rst @@ -1,9 +1,9 @@ .. _overview: .. _using-sqlalchemy: -======== -Overview -======== +================ +Feature Overview +================ .. rubric:: Table of contents diff --git a/docs/support.md b/docs/support.md new file mode 100644 index 0000000..4b24c3f --- /dev/null +++ b/docs/support.md @@ -0,0 +1,148 @@ +(support-features)= +# Support Features + +The package bundles a few support and utility functions that try to fill a few +gaps you will observe when working with CrateDB, a distributed OLAP database, +since it lacks certain features, usually found in traditional OLTP databases. + +A few of the features outlined below are referred to as [polyfills], and +emulate a few functionalities, for example, to satisfy compatibility issues on +downstream frameworks or test suites. You can use them at your disposal, but +you should know what you are doing, as some of them can seriously impact the +performance. + + +(support-automatic-refresh)= +## Automatic Table REFRESH after DML + +:::{rubric} Introduction +::: +CrateDB is [eventually consistent]. Data written with a former statement is +not guaranteed to be fetched with the next following select statement for the +affected rows. + +Data written to CrateDB is flushed periodically, the refresh interval is +1000 milliseconds by default, and can be changed. More details can be found in +the reference documentation about [table refreshing](inv:crate-reference#refresh_data). + +There are situations where stronger consistency is required, for example when +needing to satisfy test suites of 3rd party frameworks, which usually do not +take such special behavior of CrateDB into consideration. + +:::{rubric} Utility +::: +The `refresh_after_dml` utility will configure an SQLAlchemy engine or session +to automatically invoke relevant `REFRESH TABLE` statements after each DML +operation (INSERT, UPDATE, DELETE), for the corresponding entities / tables. + +```python +import sqlalchemy as sa +from sqlalchemy_cratedb.support import refresh_after_dml + +engine = sa.create_engine("crate://") +refresh_after_dml(engine) +``` + +```python +import sqlalchemy as sa +from sqlalchemy.orm import sessionmaker +from sqlalchemy_cratedb.support import refresh_after_dml + +engine = sa.create_engine("crate://") +session = sessionmaker(bind=engine)() +refresh_after_dml(session) +``` + +:::{warning} +Refreshing the table after each DML operation can cause serious +performance degradation, and should only be used on low-volume, low-traffic +data, when applicable, and if you know what you are doing. +::: + + +## Support for pandas and Dask +Todo. + + +## Synthetic Autoincrement using Timestamps + +:::{rubric} Introduction +::: +Todo. + +:::{rubric} Utility +::: +In order to emulate some kind of autoincrement behavior, this polyfill patch +will simply assign `sa.func.now()` as a column `default`. You can use it if +adjusting ORM models for your database adapter is not an option. + +The patch enables to optionally use `autoincrement=True` on column definitions. +It works on SQLAlchemy column types `sa.BigInteger`, `sa.DateTime`, and +`sa.String`. + +```python +import sqlalchemy as sa +from sqlalchemy.orm import declarative_base +from sqlalchemy_cratedb.support import patch_autoincrement_timestamp + +# Enable patch. +patch_autoincrement_timestamp() + +# Define database schema. +Base = declarative_base() + +class FooBar(Base): + id = sa.Column(sa.DateTime, primary_key=True, autoincrement=True) +``` + + +## Synthetic UNIQUE Constraints + +:::{rubric} Introduction +::: +CrateDB does not provide `UNIQUE` constraints. Because of its distributed +nature, it would be an expensive operation. + +:::{rubric} Utility +::: +This feature emulates "unique constraints" functionality by querying the +table for unique values before invoking the SQL `INSERT` operation. +When the uniqueness constraint is violated, the adapter will raise a +corresponding exception. +```python +IntegrityError: DuplicateKeyException in table 'foobar' on constraint 'name' +``` + +```python +import sqlalchemy as sa +from sqlalchemy.orm import declarative_base +from sqlalchemy.event import listen +from sqlalchemy_cratedb.support import check_uniqueness_factory + +# Define database schema. +Base = declarative_base() + +class FooBar(Base): + id = sa.Column(sa.String, primary_key=True) + name = sa.Column(sa.String) + +# Add synthetic UNIQUE constraint on `name` column. +listen(FooBar, "before_insert", check_uniqueness_factory(FooBar, "name")) +``` + +:::{note} +This feature will only work well if table data is consistent, which can be +ensured by invoking a `REFRESH TABLE` statement after any DML operation. +For conveniently enabling "always refresh", please refer to the documentation +section about [](#support-automatic-refresh). +::: + +:::{warning} +Querying the table before each insert operation can cause serious +performance degradation, and should only be used on low-volume, low-traffic +data, when applicable, and if you know what you are doing. +::: + + +[eventually consistent]: https://en.wikipedia.org/wiki/Eventual_consistency +[polyfills]: https://en.wikipedia.org/wiki/Polyfill_(programming)