-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add adapter wrapper for MLflow/CrateDB, based on monkeypatching #5
Conversation
Remove previous strategy, where hooking into the corresponding database wrapper function was extremely invasive, and not sustainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! added few suggestions
README.md
Outdated
docker run --rm -it --publish=4200:4200 --publish=5432:5432 \ | ||
--env=CRATE_HEAP_SIZE=4g crate \ | ||
-Cdiscovery.type=single-node \ | ||
-Ccluster.routing.allocation.disk.threshold_enabled=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that its a good idea to advertise disabling this threshold in examples. If disabled, writing data into CrateDB can lead to a machine crash due out-of-disk-space issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, true. I will remove it. It slipped in because I am running it this way because apparently my disk is always full, or pretends to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with 9b02b1a.
mlflow_cratedb/monkey/db_utils.py
Outdated
|
||
def patch_sqlalchemy_inspector(engine: sa.Engine): | ||
""" | ||
When using `get_table_names()`, make sure the correct schema name gets used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't adjusting the search_path
a better way to set a default schema to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting. I observed some flaws here, but I will revisit it to find out why the search_path
may not be honored here. I think the other parts of the application are already using that technique well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, patching that function was unnecessary, and got removed with 186c9e4 again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again. We needed to bring back this patch with e8acfc2. #12 (commits) has more details.
@@ -0,0 +1,137 @@ | |||
CREATE TABLE IF NOT EXISTS {schema_prefix}"datasets" ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea with the search_path
, I think the code in this repository is already using that technique partly through SQLAlchemy.
Concluding that, I may also think, that when using that here again, we would not need to populate the schema name (here: {schema_prefix}
) into the SQL DDL statements at all, so the code could be simplified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concluding that, I may also think, that when using that here again, we would not need to populate the schema name (here: {schema_prefix}) into the SQL DDL statements at all, so the code could be simplified?
Yes right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved with 3488790.
- Use `DOUBLE` instead of `REAL` - Add missing `metrics.value` to primary key
There will be tests marked with `pytest.mark.slow`. `poe check-fast` will omit them.
About
We evaluated how to make MLflow work with CrateDB on a fork of MLflow already, see crate-workbench/mlflow@master...cratedb. This patch brings in the same amalgamations, but uses monkeypatching instead.
The idea behind this repository and patching style is to create a shippable package, because we can reasonably anticipate that the corresponding changes will never make it into upstream MLflow.
Software Tests
Tests for conducting basic database conversations are already included with this patch. More thorough tests from MLflow will be added on behalf of subsequent patches, see GH-6 and GH-10.
/cc @andnig, @ckurze