Skip to content

Commit

Permalink
Support: Add dedicated documentation page about poly-fills and utilities
Browse files Browse the repository at this point in the history
  • Loading branch information
amotl committed Jun 20, 2024
1 parent 2bbed79 commit d7882ae
Show file tree
Hide file tree
Showing 4 changed files with 154 additions and 3 deletions.
1 change: 1 addition & 0 deletions docs/index-all.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ CrateDB SQLAlchemy dialect -- all pages
advanced-querying
inspection-reflection
dataframe
support
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,10 @@ kinds of `GeoJSON geometry objects`_.

.. toctree::
:maxdepth: 2
:titlesonly:

overview
support


.. _synopsis:
Expand Down
6 changes: 3 additions & 3 deletions docs/overview.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
.. _overview:
.. _using-sqlalchemy:

========
Overview
========
================
Feature Overview
================

.. rubric:: Table of contents

Expand Down
148 changes: 148 additions & 0 deletions docs/support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
(support-features)=
# Support Features

The package bundles a few support and utility functions that try to fill a few
gaps you will observe when working with CrateDB, a distributed OLAP database,
since it lacks certain features, usually found in traditional OLTP databases.

A few of the features outlined below are referred to as [polyfills], and
emulate a few functionalities, for example, to satisfy compatibility issues on
downstream frameworks or test suites. You can use them at your disposal, but
you should know what you are doing, as some of them can seriously impact the
performance.


(support-automatic-refresh)=
## Automatic Table REFRESH after DML

:::{rubric} Introduction
:::
CrateDB is [eventually consistent]. Data written with a former statement is
not guaranteed to be fetched with the next following select statement for the
affected rows.

Data written to CrateDB is flushed periodically, the refresh interval is
1000 milliseconds by default, and can be changed. More details can be found in
the reference documentation about [table refreshing](inv:crate-reference#refresh_data).

There are situations where stronger consistency is required, for example when
needing to satisfy test suites of 3rd party frameworks, which usually do not
take such special behavior of CrateDB into consideration.

:::{rubric} Utility
:::
The `refresh_after_dml` utility will configure an SQLAlchemy engine or session
to automatically invoke relevant `REFRESH TABLE` statements after each DML
operation (INSERT, UPDATE, DELETE), for the corresponding entities / tables.

```python
import sqlalchemy as sa
from sqlalchemy_cratedb.support import refresh_after_dml

engine = sa.create_engine("crate://")
refresh_after_dml(engine)
```

```python
import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
from sqlalchemy_cratedb.support import refresh_after_dml

engine = sa.create_engine("crate://")
session = sessionmaker(bind=engine)()
refresh_after_dml(session)
```

:::{warning}
Refreshing the table after each DML operation can cause serious
performance degradation, and should only be used on low-volume, low-traffic
data, when applicable, and if you know what you are doing.
:::


## Support for pandas and Dask
Todo.


## Synthetic Autoincrement using Timestamps

:::{rubric} Introduction
:::
Todo.

:::{rubric} Utility
:::
In order to emulate some kind of autoincrement behavior, this polyfill patch
will simply assign `sa.func.now()` as a column `default`. You can use it if
adjusting ORM models for your database adapter is not an option.

The patch enables to optionally use `autoincrement=True` on column definitions.
It works on SQLAlchemy column types `sa.BigInteger`, `sa.DateTime`, and
`sa.String`.

```python
import sqlalchemy as sa
from sqlalchemy.orm import declarative_base
from sqlalchemy_cratedb.support import patch_autoincrement_timestamp

# Enable patch.
patch_autoincrement_timestamp()

# Define database schema.
Base = declarative_base()

class FooBar(Base):
id = sa.Column(sa.DateTime, primary_key=True, autoincrement=True)
```


## Synthetic UNIQUE Constraints

:::{rubric} Introduction
:::
CrateDB does not provide `UNIQUE` constraints. Because of its distributed
nature, it would be an expensive operation.

:::{rubric} Utility
:::
This feature emulates "unique constraints" functionality by querying the
table for unique values before invoking the SQL `INSERT` operation.
When the uniqueness constraint is violated, the adapter will raise a
corresponding exception.
```python
IntegrityError: DuplicateKeyException in table 'foobar' on constraint 'name'
```

```python
import sqlalchemy as sa
from sqlalchemy.orm import declarative_base
from sqlalchemy.event import listen
from sqlalchemy_cratedb.support import check_uniqueness_factory

# Define database schema.
Base = declarative_base()

class FooBar(Base):
id = sa.Column(sa.String, primary_key=True)
name = sa.Column(sa.String)

# Add synthetic UNIQUE constraint on `name` column.
listen(FooBar, "before_insert", check_uniqueness_factory(FooBar, "name"))
```

:::{note}
This feature will only work well if table data is consistent, which can be
ensured by invoking a `REFRESH TABLE` statement after any DML operation.
For conveniently enabling "always refresh", please refer to the documentation
section about [](#support-automatic-refresh).
:::

:::{warning}
Querying the table before each insert operation can cause serious
performance degradation, and should only be used on low-volume, low-traffic
data, when applicable, and if you know what you are doing.
:::


[eventually consistent]: https://en.wikipedia.org/wiki/Eventual_consistency
[polyfills]: https://en.wikipedia.org/wiki/Polyfill_(programming)

0 comments on commit d7882ae

Please sign in to comment.