Add adapter wrapper for MLflow/CrateDB, based on monkeypatching #5

amotl · 2023-09-07T11:15:22Z

About

We evaluated how to make MLflow work with CrateDB on a fork of MLflow already, see crate-workbench/mlflow@master...cratedb. This patch brings in the same amalgamations, but uses monkeypatching instead.

The idea behind this repository and patching style is to create a shippable package, because we can reasonably anticipate that the corresponding changes will never make it into upstream MLflow.

Software Tests

Tests for conducting basic database conversations are already included with this patch. More thorough tests from MLflow will be added on behalf of subsequent patches, see GH-6 and GH-10.

/cc @andnig, @ckurze

Remove previous strategy, where hooking into the corresponding database wrapper function was extremely invasive, and not sustainable.

seut

Nice! added few suggestions

seut · 2023-09-08T06:57:47Z

README.md

+docker run --rm -it --publish=4200:4200 --publish=5432:5432 \
+  --env=CRATE_HEAP_SIZE=4g crate \
+  -Cdiscovery.type=single-node \
+  -Ccluster.routing.allocation.disk.threshold_enabled=false


I don't think that its a good idea to advertise disabling this threshold in examples. If disabled, writing data into CrateDB can lead to a machine crash due out-of-disk-space issues.

Yeah, true. I will remove it. It slipped in because I am running it this way because apparently my disk is always full, or pretends to be.

Fixed with 9b02b1a.

seut · 2023-09-08T07:08:14Z

mlflow_cratedb/monkey/db_utils.py

+
+def patch_sqlalchemy_inspector(engine: sa.Engine):
+    """
+    When using `get_table_names()`, make sure the correct schema name gets used.


Isn't adjusting the search_path a better way to set a default schema to use?

Thanks for spotting. I observed some flaws here, but I will revisit it to find out why the search_path may not be honored here. I think the other parts of the application are already using that technique well.

Indeed, patching that function was unnecessary, and got removed with 186c9e4 again.

Hi again. We needed to bring back this patch with e8acfc2. #12 (commits) has more details.

amotl · 2023-09-08T09:29:34Z

mlflow_cratedb/adapter/ddl/cratedb.sql

@@ -0,0 +1,137 @@
+CREATE TABLE IF NOT EXISTS {schema_prefix}"datasets" (


Good idea with the search_path, I think the code in this repository is already using that technique partly through SQLAlchemy.

Concluding that, I may also think, that when using that here again, we would not need to populate the schema name (here: {schema_prefix}) into the SQL DDL statements at all, so the code could be simplified?

Concluding that, I may also think, that when using that here again, we would not need to populate the schema name (here: {schema_prefix}) into the SQL DDL statements at all, so the code could be simplified?

Yes right.

Improved with 3488790.

- Use `DOUBLE` instead of `REAL` - Add missing `metrics.value` to primary key

There will be tests marked with `pytest.mark.slow`. `poe check-fast` will omit them.

Add adapter wrapper for MLflow/CrateDB, based on monkeypatching

23705bb

amotl requested review from hammerhead, hlcianfagna and seut September 7, 2023 11:16

amotl marked this pull request as ready for review September 7, 2023 16:17

amotl mentioned this pull request Sep 7, 2023

Add software tests from MLflow #6

Merged

amotl added 4 commits September 7, 2023 22:11

Documentation: Recommend to use a dedicated database schema, e.g. mlflow

93af416

Improve running without dedicated database schema (None)

4c5891b

Run database provisioning only once per process instance

14d7c0d

Transparently invoke REFRESH TABLE after inserts, updates, and deletes

fb8c740

Remove previous strategy, where hooking into the corresponding database wrapper function was extremely invasive, and not sustainable.

amotl force-pushed the make-it-work branch from 2122720 to fb8c740 Compare September 7, 2023 20:19

seut reviewed Sep 8, 2023

View reviewed changes

amotl commented Sep 8, 2023

View reviewed changes

amotl added 10 commits September 9, 2023 12:17

Fix CrateDB database schema DDL

aefc5fd

- Use `DOUBLE` instead of `REAL` - Add missing `metrics.value` to primary key

SA: Improve autoincrement polyfill

c7ff38f

SA: Add patch to remove FOR UPDATE clauses

089f4c7

Add more patches and polyfills

4a6d6c6

Sandbox: Add poe check-fast vs. poe check

e8a7084

There will be tests marked with `pytest.mark.slow`. `poe check-fast` will omit them.

Refactor adapters and patches

83883c0

Do not propagate schema into SQL DDL (_setup_db_{drop,create}_tables)

9306a1b

Update README not to advertise not applicable CrateDB options

e2e8fcc

Tests: Add a few adapter tests, verifying basic database conversations

34486ea

CI: Provide CrateDB nightly to the test suite on GHA

25919b5

amotl force-pushed the make-it-work branch from e809bba to 25919b5 Compare September 9, 2023 12:49

amotl added 2 commits September 9, 2023 15:26

Refactor adapters and patches once again

440ef2e

Remove unnecessary patching of SQLAlchemy inspector

186c9e4

amotl mentioned this pull request Sep 9, 2023

Adjust software tests for CrateDB #10

Merged

amotl requested a review from seut September 9, 2023 14:55

Fix random integer generation

6881333

amotl merged commit 8ff918b into main Sep 12, 2023
1 check passed

amotl deleted the make-it-work branch September 12, 2023 10:34

amotl mentioned this pull request Oct 10, 2023

Contrib: Add a few SQLAlchemy patches and polyfills crate/cratedb-toolkit#59

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add adapter wrapper for MLflow/CrateDB, based on monkeypatching #5

Add adapter wrapper for MLflow/CrateDB, based on monkeypatching #5

amotl commented Sep 7, 2023 •

edited

Loading

seut left a comment

seut Sep 8, 2023

amotl Sep 8, 2023

amotl Sep 9, 2023

seut Sep 8, 2023 •

edited

Loading

amotl Sep 8, 2023 •

edited

Loading

amotl Sep 9, 2023

amotl Sep 12, 2023

amotl Sep 8, 2023

seut Sep 8, 2023

amotl Sep 9, 2023

		@@ -0,0 +1,137 @@
		CREATE TABLE IF NOT EXISTS {schema_prefix}"datasets" (

Add adapter wrapper for MLflow/CrateDB, based on monkeypatching #5

Add adapter wrapper for MLflow/CrateDB, based on monkeypatching #5

Conversation

amotl commented Sep 7, 2023 • edited Loading

About

Software Tests

seut left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seut Sep 8, 2023 • edited Loading

Choose a reason for hiding this comment

amotl Sep 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amotl commented Sep 7, 2023 •

edited

Loading

seut Sep 8, 2023 •

edited

Loading

amotl Sep 8, 2023 •

edited

Loading