-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: move unity catalog integration to it's own crate #3044
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3044 +/- ##
==========================================
- Coverage 72.65% 72.40% -0.25%
==========================================
Files 129 128 -1
Lines 41584 41314 -270
Branches 41584 41314 -270
==========================================
- Hits 30211 29915 -296
- Misses 9405 9430 +25
- Partials 1968 1969 +1 ☔ View full report in Codecov by Sentry. |
r#" | ||
let client = reqwest_middleware::ClientBuilder::new(Client::new()).build(); | ||
|
||
let _retry_config = RetryConfig::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if I was going to remove it at that point from the testing, so it's a side effect left over from that.
Response::new(Body::from( | ||
r#" | ||
let client = reqwest_middleware::ClientBuilder::new(Client::new()).build(); | ||
let _retry_config = RetryConfig::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another unused variable that should be removed? 🤔
crates/core/Cargo.toml
Outdated
@@ -137,4 +138,4 @@ datafusion = [ | |||
datafusion-ext = ["datafusion"] | |||
json = ["parquet/json"] | |||
python = ["arrow/pyarrow"] | |||
unity-experimental = ["reqwest", "hyper"] | |||
unity-experimental = ["reqwest", "reqwest-middleware"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this feature come off the core crate now that this code is moved into its own crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a discussion point I wanted to talk about. The main trait for the catalog and the errors remained in the core crate. So core can build I assume generic catalog support and catalog crates just depend on core and implement. At first glance I kinda didn't have a cohesive error handling thing here, but I think thinking about it. There should be a generic catalog error in core, but have no conditionals features on it. And individual crates implement a conversion to that error type. Sound good?
let compare_expr = match generalize_filter( | ||
let compare_expr = generalize_filter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed a few warnings along the way, this statement could be just written with an ?
operator instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, just some minor questions.
More gene4rally speaking though, I think we need to make some more deliberate choices around how we want to integrate with catalogs.
AS I have been out for a while, I am not sure where we are in terms of eagerness to push the next release. I would hope that we can do some more cleaning and most importantly testing before we release get "more serious" with unity ...
w.r.t. testing I hope that we can even do some integration tests using OSS UC. Happy the help with that.
crates/catalog-unity/src/lib.rs
Outdated
@@ -216,7 +226,10 @@ impl FromStr for UnityCatalogConfigKey { | |||
"workspace_url" | "unity_workspace_url" | "databricks_workspace_url" => { | |||
Ok(UnityCatalogConfigKey::WorkspaceUrl) | |||
} | |||
_ => Err(UnityCatalogError::UnknownConfigKey(s.into()).into()), | |||
_ => Err(DataCatalogError::UnknownConfigKey { | |||
catalog: "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in this case the "catalog" would be "Unity".
Btw... a lot fi the client impl was just stolen from Object store, where this field identifies the specific implementation (azure, gcp, ...). If we have this error live in a sepate create, this dioes not make much sense anymore.
crates/catalog-unity/src/lib.rs
Outdated
|
||
Ok(resp.json().await?) | ||
Ok(resp.json().await.map_err(UnityCatalogError::from)?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have an impl From<...>
for the error, why do we need the map? does the function return a different kind of err.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of jank, the base error type in core no longer has a from for rewqest::Error so we convert to our own error type that does have a From to DataCatalogError
"GOOGLE_SERVICE_ACCOUNT", | ||
account_path.as_path().to_str().unwrap(), | ||
); | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I missing something, or did we remove unsafe
around set_var
further above, specifically in crates/aws/src/storage.rs
, and here we are adding it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to add it back to the other one, thanks for pointing it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
One ore thing though, and not sure if this would actually happen, but it seems GH wants to re-apply a bunch of commits already on main. If all of these were duplicated, this would give us (IMO) a fairly dirty history. Could we squash these to make it a bit cleaner?
@roeap yes, I'll squash prior to merge |
@hntd187 unfortunately it looks like this needs to be rebased again 🤦 |
Signed-off-by: Stephen Carman <[email protected]>
Signed-off-by: Stephen Carman <[email protected]>
Signed-off-by: Stephen Carman <[email protected]>
Signed-off-by: Stephen Carman <[email protected]>
I have resolved conflicts, I will push and land this before I cut another release today |
Description
This moves the catalog integration of unity catalog to it's own crate. It also replaces the mock server with the httpmock crate and the custom retry logic with request_middleware's retry middleware. I think this will be easier to manage as a client and I do not believe many people were using this integration anyways.
Related Issue(s)
Documentation